Skip to content

refactor: reduce clone pressure in compressible trackers#2344

Open
sergeytimoshin wants to merge 14 commits intomainfrom
sergey/compressible-clone-cleanup
Open

refactor: reduce clone pressure in compressible trackers#2344
sergeytimoshin wants to merge 14 commits intomainfrom
sergey/compressible-clone-cleanup

Conversation

@sergeytimoshin
Copy link
Contributor

@sergeytimoshin sergeytimoshin commented Mar 14, 2026

Summary by CodeRabbit

  • New Features

    • Prover startup health checks with retries and improved startup monitoring.
    • Concurrent per-tree worker subsystem for parallel new-tree processing.
    • Typed batch snapshot API for batched address/proof retrieval.
  • Bug Fixes

    • Prevented panics during queue/state parsing by replacing unwraps with safe handling.
    • Stricter proof/height validation to catch malformed proofs earlier.
  • Improvements

    • Reduced memory overhead via shared ownership patterns.
    • Broader error propagation across proof/client flows and stronger input validation.
  • Tests

    • Test harnesses updated to run pipelines in isolated runtimes with clearer failure reporting.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: reducing clone pressure in compressible trackers through refactoring, which is reflected throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 70.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sergey/compressible-clone-cleanup
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch 4 times, most recently from 579d45a to 6d927c2 Compare March 14, 2026 19:12
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from 6d927c2 to ae11676 Compare March 14, 2026 19:21
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch 4 times, most recently from 3498eb9 to 0e56512 Compare March 14, 2026 21:34
@sergeytimoshin sergeytimoshin changed the title refactor: reduce clone pressure in compressible trackers refactor: reduce clone pressure in compressible trackers Mar 14, 2026
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from 0e56512 to 9e3e448 Compare March 15, 2026 00:37
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from 9e3e448 to 986107b Compare March 15, 2026 15:04
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from 986107b to d7783b8 Compare March 16, 2026 15:17
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from d7783b8 to 64ada13 Compare March 17, 2026 08:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
sdk-libs/client/src/local_test_validator.rs (1)

71-97: 🧹 Nitpick | 🔵 Trivial

Command string built via concatenation may misbehave with special characters.

The command is assembled by directly interpolating program_id, program_path, upgrade_authority, and validator_args into a shell string. If any value contains spaces, quotes, or shell metacharacters (e.g., ; rm -rf /), the shell will interpret them.

While this is test infrastructure and the inputs are typically controlled, consider either:

  1. Escaping/quoting values (e.g., using shell-escape crate), or
  2. Passing arguments directly to Command without going through sh -c.

Example for option 2 (partial):

// Instead of sh -c with concatenated string:
Command::new(&format!("{}/cli/test_bin/run", project_root.trim()))
    .arg("test-validator")
    .args(["--sbf-program", &sbf_program.0, &sbf_program.1])
    // ... etc

This avoids shell interpretation entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 71 - 97, The loop
builds a shell command string by concatenating unescaped values (sbf_programs,
upgradeable_programs' program_id/program_path/upgrade_authority,
validator_args), which risks shell interpretation of special characters; change
to constructing the process invocation with std::process::Command (e.g., the run
binary created via format!("{}/cli/test_bin/run", project_root.trim()) and the
"test-validator" subcommand) and add each flag/argument with .arg or .args
instead of string concatenation, mapping each sbf_program to
.args(&["--sbf-program", &sbf_program.0, &sbf_program.1]), each
upgradeable_program to .args(&["--upgradeable-program",
&upgradeable_program.program_id, &upgradeable_program.program_path,
&upgradeable_program.upgrade_authority"]), conditionally add "--skip-prover" and
"--use-surfpool" with .arg, and append config.validator_args via .args to avoid
passing through a shell; alternatively, if you must build a shell string, apply
proper escaping using a crate like shell-escape for the same fields.
forester/src/epoch_manager.rs (2)

4775-4788: ⚠️ Potential issue | 🔴 Critical

Shutdown currently skips the NewTreeWorker cleanup path.

When the shutdown branch wins this tokio::select!, epoch_manager.run() is dropped rather than allowed to reach its epilogue. That means shutdown_new_tree_workers() never runs, so any already spawned NewTreeWorker threads keep their own Tokio runtimes alive and can continue submitting work after the service reports shutdown. Please plumb the signal into EpochManager::run() or expose an explicit async shutdown() and await it here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 4775 - 4788, The shutdown branch
currently wins the tokio::select! and drops epoch_manager.run(), preventing
NewTreeWorker cleanup (shutdown_new_tree_workers()) from running; modify the
code so the shutdown signal is propagated into EpochManager rather than
canceling it — either add a cancellation-aware entry point (e.g., change
EpochManager::run() to accept a shutdown signal/Receiver) or implement an
explicit async EpochManager::shutdown() that you call and await here when
shutdown wins; ensure you reference EpochManager::run(),
EpochManager::shutdown() (if added), and the
shutdown_new_tree_workers()/NewTreeWorker cleanup path so the manager completes
its epilogue before returning.

558-574: ⚠️ Potential issue | 🟠 Major

Abort discovery before draining tracked new-tree workers.

tree_discovery_handle is still live while shutdown_new_tree_workers() is taking and cancelling the current worker list. A tree discovered in that window can register a fresh NewTreeWorker after the vector has been drained, so it escapes shutdown and keeps processing past the manager exit path.

Suggested fix
-        monitor_handle.abort();
-        self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
-            .await;
+        monitor_handle.abort();
+        drop(_guard);
+        self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
+            .await;

Also applies to: 678-681

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 558 - 574, The scopeguard closure
currently aborts current_previous_handle, tree_discovery_handle, and
balance_check_handle together which allows tree_discovery_handle to remain live
while shutdown_new_tree_workers() drains and cancels workers; change the
shutdown sequence so tree_discovery_handle is aborted before any call that
drains or cancels NewTreeWorker instances (i.e., ensure
tree_discovery_handle.abort() runs prior to invoking shutdown_new_tree_workers()
or draining the worker vector), update the closure or surrounding teardown logic
to call tree_discovery_handle.abort() first and then perform
shutdown_new_tree_workers(), and apply the same ordering fix at the other
teardown site referenced around lines 678-681 so discovery cannot register new
workers after the drain starts.
forester/src/processor/v2/strategy/address.rs (1)

170-205: ⚠️ Potential issue | 🔴 Critical

The snapshot indexing is misaligned with the data arrays.

In fetch_queue_data, data_start_index is set to tree_next_insertion_index() (line 170). However, build_proof_job calls get_batch_snapshot(start, batch_end) where start is a relative offset (line 273). The get_batch_snapshot implementation treats this offset as a direct index into the data arrays—using data.addresses.get(start..end) (line 502 in helpers.rs).

Since the data arrays are populated from StreamingAddressQueue::start_index() (the queue window's absolute start), but data_start_index is now tree_next_insertion_index() (a different value), the snapshot lookup will access the wrong address slice. This breaks the overlap guard (reconcile_alignment) and can cause replay of already-inserted addresses.

Either:

  • Rebase get_batch_snapshot calls to use start + (streaming_queue.start_index() - streaming_queue.tree_next_insertion_index()), or
  • Keep data_start_index = streaming_queue.start_index() and handle tree alignment separately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/strategy/address.rs` around lines 170 - 205, The
data_start_index returned by fetch_queue_data is set to
streaming_queue.tree_next_insertion_index(), which misaligns with how
get_batch_snapshot interprets the start offset (it indexes directly into the
data arrays populated from StreamingAddressQueue::start_index()); update
fetch_queue_data to set data_start_index = streaming_queue.start_index()
(preserving tree_next_insertion_index use for staging tree creation), or
alternatively, keep data_start_index as-is but adjust build_proof_job calls to
rebase snapshot offsets before calling get_batch_snapshot (compute
adjusted_start = start + (streaming_queue.start_index() -
streaming_queue.tree_next_insertion_index()) and pass adjusted_start), and
ensure reconcile_alignment still compares indices using the same base (use
StreamingAddressQueue::start_index() when checking overlap).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 914-994: This spawns an unbounded OS thread + dedicated Tokio
runtime per new tree (inside the block creating cancel_tx/completion_tx and
thread_handle) which can oversubscribe the host; instead, route new-tree work
through the existing tree-processing concurrency limiter used by normal trees
(the same semaphore/pool guarded by max_parallel_tree_workers()/the struct's
worker semaphore) and reuse the existing worker runtime rather than building a
new Tokio runtime per tree. Change the code that spawns a thread and builds a
runtime for each new tree (the closure calling process_queue and using
cancel_rx/completion_rx) to acquire the common semaphore/pool before scheduling
work, submit the processing task onto the shared runtime/task queue (or spawn
onto existing worker threads), and only keep the cancellation/completion
channels and register_new_tree_worker/NewTreeWorker for lifecycle tracking
without creating a dedicated runtime per tree.

In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Line 90: Remove the duplicate prover startup: the call to spawn_prover() after
init(Some(LightValidatorConfig { enable_prover: true, ... })) should be deleted
because init already starts the prover when enable_prover is true; locate the
test in batched_state_async_indexer_test.rs and remove the explicit
spawn_prover() invocation to avoid binding the same port twice and the
subsequent unwrap() panic.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 219-230: The code truncates new_element_values to batch_len but
not the related low_element_* slices; to make intent explicit and avoid indexing
surprises, create truncated slice bindings for each low_element slice (e.g.,
low_element_values, low_element_next_values, low_element_indices,
low_element_next_indices, and any other per-element input slices) using the same
..batch_len slice pattern before the loop/processing that uses them, and then
use those truncated bindings everywhere instead of indexing the original
vectors; keep the existing
staged_changelog/staged_indexed_changelog/staged_sparse_merkle_tree logic
unchanged.

In `@prover/client/src/prover.rs`:
- Around line 85-86: The current health-check boolean combines TCP read success
and a curl fallback which can mask non-200 HTTP responses; change the logic in
the health check block (the bytes_read check in prover.rs) so that when
bytes_read > 0 you return the TCP result directly by calling
has_http_ok_status(&response[..bytes_read]) and only call
health_check_once_with_curl(timeout) when the TCP path failed (e.g., bytes_read
== 0 or earlier connect/write/read errors); update the conditional controlling
those calls (the bytes_read branch) to implement this explicit fall-through
behavior.
- Around line 175-202: The local binding spawn_result captures the result of the
async block but the block returns () so the variable is unused; remove the
unnecessary spawn_result binding and simply await the async block inline (or use
let _ = ...await) where the Command::new(&prover_path) / wait_for_prover_health
/ monitor_prover_child logic runs, then keep IS_LOADING.store(false,
Ordering::Release) afterwards—this cleans up the dead variable while preserving
the startup, health-check, and child-monitor behavior.

In `@prover/client/tests/batch_address_append.rs`:
- Around line 109-124: The call to
get_batch_address_append_circuit_inputs(...).unwrap() in the warmup phase
returns circuit inputs that are currently discarded; update the call site (the
invocation of get_batch_address_append_circuit_inputs with warmup_* args) to
make the intent explicit by either assigning the result to a clearly named
placeholder (e.g. let _warmup_inputs = ...) or by adding a concise comment
before the call explaining "result intentionally discarded — only side effects
on sparse_merkle_tree, changelog, and indexed_changelog are needed for warmup",
keeping the unwrap as-is; reference the symbols
get_batch_address_append_circuit_inputs, sparse_merkle_tree, changelog, and
indexed_changelog when making the change.
- Line 128: The variable remaining_values is misleading because it binds the
full slice &new_element_values[..]; rename remaining_values to batch_values (or
remove the alias and use new_element_values directly) and update every usage of
remaining_values in the test to the new name so the intent matches the data
(references: remaining_values, new_element_values in the batch_address_append
test).

In `@sdk-libs/client/src/indexer/photon_indexer.rs`:
- Around line 1145-1162: The second, redundant validation can be removed: inside
the block that computes expected_siblings and expected_total (using
STATE_MERKLE_TREE_HEIGHT and STATE_MERKLE_TREE_CANOPY_DEPTH) keep only the first
check that compares x.proof.len() to expected_total and the subsequent
computation of proof_len if needed, but drop the impossible second check that
compares proof_len to expected_siblings (the IndexerError::InvalidParameters
branch referencing proof_len vs expected_siblings) since it can never fail when
the first check passes.
- Around line 1693-1695: The assignment of tree_next_insertion_index currently
uses aq.start_index (a pagination offset) which is wrong; replace that value
with the actual next leaf index from the tree state—e.g., call
address_tree_bundle.right_most_index() (or derive from queue_indices/queue state
if Photon API is missing the field) when setting tree_next_insertion_index in
the struct initializer where start_index, tree_next_insertion_index, root_seq
are set; ensure tree_next_insertion_index represents the sparse tree insertion
point used to initialize staging trees rather than the pagination offset.

In `@sdk-libs/client/src/indexer/types/queue.rs`:
- Around line 198-203: The lookup-based proof reconstruction uses the unstable
is_multiple_of API when computing sibling_pos in reconstruct_proof_with_lookup;
replace the unstable call with a stable parity check (e.g., test pos % 2 == 0 or
pos & 1 == 0) when deciding sibling_pos, and ensure the same replacement is
applied to the sibling index computation inside reconstruct_proof_with_lookup so
behavior remains identical but uses stable Rust methods.
- Around line 274-286: Replace the unstable is_multiple_of call in the test
helper's loop with a stable check using modulus: instead of
pos.is_multiple_of(2) use pos % 2 == 0 (or equivalent) when computing
sibling_pos inside the loop that constructs node_index (the block that inserts
into nodes via nodes.entry(node_index).or_insert_with(||
hash_from_node(node_index))) and ensure the same change is applied to any other
occurrences in build_queue_data so pos is updated correctly before pos /= 2.
- Around line 175-177: Add a doc comment to the proof_count method explaining
that it returns the number of complete proof entries by taking the minimum of
addresses.len() and low_element_indices.len(); document that if those two
vectors diverge the method reports only the count of fully-paired elements
(i.e., the number of valid address→low_element_index pairs) so callers
understand the invariant and why min(...) is used; mention the related fields
addresses and low_element_indices in the comment for clarity.

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 164-165: The loop that iterates over
cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) causes MissingAtaContext.index
to be the local chunk index instead of the global ATA index; fix build_ata_load
error reporting by computing and preserving the global index: iterate with
enumerate over chunks (e.g., for (chunk_idx, (chunk, proof)) in
cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs).enumerate()), compute
chunk_start = chunk_idx * MAX_ATAS_PER_IX, and when constructing any
MissingAtaContext (inside build_ata_load or just before calling it) set index =
chunk_start + local_index (where local_index is the position within chunk) so
the reported MissingAtaContext.index refers to the original cold_atas index.
- Around line 135-139: The three concurrent pipelines (fetch_proof_batches,
fetch_proofs_batched, fetch_individual_proofs) each buffer up to
PROOF_FETCH_CONCURRENCY, creating up to 3x the intended concurrent
get_validity_proof calls; change the implementation to use a single shared
concurrency limiter instead of independent buffers. Add a shared limiter (e.g.,
a tokio::sync::Semaphore or an Arc<Limiter>) and pass it into
fetch_proof_batches, fetch_proofs_batched, and fetch_individual_proofs; have
each proof-fetching operation acquire a permit before calling get_validity_proof
and release it after, so the global concurrency across all pipelines is bounded
by PROOF_FETCH_CONCURRENCY (or another configured global limit) rather than
per-pipeline. Ensure constructors/signatures for those functions accept the
shared limiter and use it wherever get_validity_proof is awaited.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 119-128: The spawned validator process (created with
Command::new("sh")... and assigned to the dropped variable _child when
use_surfpool is false) is intentionally detached and lacks any cleanup path;
change the implementation to retain and expose the Child handle so callers/tests
can terminate the validator: either store the Child in a shared structure (e.g.,
a static Mutex/OnceCell) or return it from the function that starts the
validator (so callers receive the Child and can call kill/await), and update any
callers/tests to call .kill()/.wait() or drop appropriately during teardown;
ensure the symbol names to change are the creation site where _child is assigned
and the surrounding function that gates on use_surfpool.
- Around line 60-62: The code silently no-ops when get_project_root() returns
None; update the containing function (e.g., the function that calls
get_project_root() in local_test_validator.rs) to return a Result and propagate
an error instead of doing nothing, or at minimum log a clear warning and return
an Err variant so the caller knows the validator failed to start; specifically,
replace the current early-return/no-op on None from get_project_root() with a
branch that calls processLogger.error/warn (or uses anyhow::bail) and returns
Err(...) from the function that constructs and spawns the command (referencing
get_project_root and the function that formats/executes the "cli/test_bin/run
test-validator" command) so failures are observable by callers/tests.

In `@sdk-libs/client/src/utils.rs`:
- Around line 36-46: The code builds light_path from the git rev-parse output
without checking Command::output() success, so on git failure you may produce an
invalid path or panic; update the block around the
std::process::Command::new("git") call to check output.status.success() (or
handle Err from .output()) and return None (or an appropriate error) if the
command failed, only parsing stdout into light_protocol_toplevel and
constructing light_path when the status is success; reference the existing
Command invocation, the light_protocol_toplevel variable and the light_path
PathBuf construction to find and fix the logic.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2593-2607: The retry loop currently uses the `?` operator when
reading the response body and when decoding via `build_compressed_proof`, which
causes early exit on transient failures; change the
`response_result.text().await.map_err(...) ?` and
`build_compressed_proof(&body)?` sites to explicit error handling: capture the
errors into `last_error` (e.g., set last_error =
Some(IndexerError::CustomError(...)) or wrap the build error similarly) and
continue the loop instead of returning, and only return
`Ok(ValidityProofWithContext { accounts: account_proof_inputs, addresses:
address_proof_inputs, proof: ... })` when both reading and decoding succeed;
ensure you reference `response_result`, `build_compressed_proof`,
`ValidityProofWithContext`, `account_proof_inputs`, `address_proof_inputs`, and
`last_error` when implementing the changes.

In `@sdk-libs/sdk-types/src/interface/program/decompression/processor.rs`:
- Line 129: The public field token_seeds uses the confusing type
Vec<Vec<Vec<u8>>>; define a small alias or newtype (e.g., type SeedsPerSigner =
Vec<Vec<u8>>; or pub struct SignerSeedGroups(pub Vec<SeedsPerSigner>);) to
represent “one seed group per signer PDA” and replace token_seeds:
Vec<Vec<Vec<u8>>> with token_seeds: SeedsPerSignerGroups (or the chosen name)
and update the other occurrence referenced (the second use at the later
location) to use the new alias/newtype so the public API is readable and less
error-prone.

In `@sdk-libs/sdk-types/src/interface/program/decompression/token.rs`:
- Around line 151-153: The code currently only pushes the vault PDA seed group
into ctx.token_seeds (ctx.token_seeds.push(seeds.iter().map(|seed|
seed.to_vec()).collect() in the decompression token flow), but when
token::owner_seeds ≠ token::seeds derive_owner() produces a different owner PDA
and processor.rs later marks the owner account (account_metas[data.owner as
usize + transfer2_packed_start].is_signer = true) without providing signer seeds
for it; fix by including the owner PDA seeds as a separate signer seed group
when owner_seeds differs from seeds (i.e., push token::owner_seeds mapped to
Vec<u8> into ctx.token_seeds in addition to the vault seeds, making sure the
order of seed groups matches how account_metas are constructed for the Transfer2
CPI), or alternatively change the signer-marking logic to only mark the owner as
signer when the owner PDA equals the vault PDA.

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 4775-4788: The shutdown branch currently wins the tokio::select!
and drops epoch_manager.run(), preventing NewTreeWorker cleanup
(shutdown_new_tree_workers()) from running; modify the code so the shutdown
signal is propagated into EpochManager rather than canceling it — either add a
cancellation-aware entry point (e.g., change EpochManager::run() to accept a
shutdown signal/Receiver) or implement an explicit async
EpochManager::shutdown() that you call and await here when shutdown wins; ensure
you reference EpochManager::run(), EpochManager::shutdown() (if added), and the
shutdown_new_tree_workers()/NewTreeWorker cleanup path so the manager completes
its epilogue before returning.
- Around line 558-574: The scopeguard closure currently aborts
current_previous_handle, tree_discovery_handle, and balance_check_handle
together which allows tree_discovery_handle to remain live while
shutdown_new_tree_workers() drains and cancels workers; change the shutdown
sequence so tree_discovery_handle is aborted before any call that drains or
cancels NewTreeWorker instances (i.e., ensure tree_discovery_handle.abort() runs
prior to invoking shutdown_new_tree_workers() or draining the worker vector),
update the closure or surrounding teardown logic to call
tree_discovery_handle.abort() first and then perform
shutdown_new_tree_workers(), and apply the same ordering fix at the other
teardown site referenced around lines 678-681 so discovery cannot register new
workers after the drain starts.

In `@forester/src/processor/v2/strategy/address.rs`:
- Around line 170-205: The data_start_index returned by fetch_queue_data is set
to streaming_queue.tree_next_insertion_index(), which misaligns with how
get_batch_snapshot interprets the start offset (it indexes directly into the
data arrays populated from StreamingAddressQueue::start_index()); update
fetch_queue_data to set data_start_index = streaming_queue.start_index()
(preserving tree_next_insertion_index use for staging tree creation), or
alternatively, keep data_start_index as-is but adjust build_proof_job calls to
rebase snapshot offsets before calling get_batch_snapshot (compute
adjusted_start = start + (streaming_queue.start_index() -
streaming_queue.tree_next_insertion_index()) and pass adjusted_start), and
ensure reconcile_alignment still compares indices using the same base (use
StreamingAddressQueue::start_index() when checking overlap).

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 71-97: The loop builds a shell command string by concatenating
unescaped values (sbf_programs, upgradeable_programs'
program_id/program_path/upgrade_authority, validator_args), which risks shell
interpretation of special characters; change to constructing the process
invocation with std::process::Command (e.g., the run binary created via
format!("{}/cli/test_bin/run", project_root.trim()) and the "test-validator"
subcommand) and add each flag/argument with .arg or .args instead of string
concatenation, mapping each sbf_program to .args(&["--sbf-program",
&sbf_program.0, &sbf_program.1]), each upgradeable_program to
.args(&["--upgradeable-program", &upgradeable_program.program_id,
&upgradeable_program.program_path, &upgradeable_program.upgrade_authority"]),
conditionally add "--skip-prover" and "--use-surfpool" with .arg, and append
config.validator_args via .args to avoid passing through a shell; alternatively,
if you must build a shell string, apply proper escaping using a crate like
shell-escape for the same fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0c1415d-1a73-4dc1-9635-3733788e8dcd

📥 Commits

Reviewing files that changed from the base of the PR and between 2644529 and 64ada13.

⛔ Files ignored due to path filters (9)
  • Cargo.lock is excluded by !**/*.lock and included by none
  • forester-utils/src/address_staging_tree.rs is excluded by none and included by none
  • program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
  • sparse-merkle-tree/src/indexed_changelog.rs is excluded by none and included by none
  • sparse-merkle-tree/tests/indexed_changelog.rs is excluded by none and included by none
📒 Files selected for processing (48)
  • forester/src/compressible/ctoken/compressor.rs
  • forester/src/compressible/ctoken/state.rs
  • forester/src/compressible/ctoken/types.rs
  • forester/src/compressible/mint/compressor.rs
  • forester/src/compressible/mint/state.rs
  • forester/src/compressible/mint/types.rs
  • forester/src/compressible/pda/compressor.rs
  • forester/src/epoch_manager.rs
  • forester/src/forester_status.rs
  • forester/src/metrics.rs
  • forester/src/priority_fee.rs
  • forester/src/processor/v2/helpers.rs
  • forester/src/processor/v2/processor.rs
  • forester/src/processor/v2/proof_cache.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/processor/v2/strategy/address.rs
  • forester/tests/e2e_test.rs
  • forester/tests/legacy/batched_state_async_indexer_test.rs
  • forester/tests/legacy/test_utils.rs
  • forester/tests/test_batch_append_spent.rs
  • forester/tests/test_compressible_ctoken.rs
  • forester/tests/test_compressible_mint.rs
  • forester/tests/test_indexer_interface.rs
  • forester/tests/test_utils.rs
  • prover/client/src/constants.rs
  • prover/client/src/errors.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_address_append.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/proof.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/client/src/interface/pack.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/client/src/utils.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/sdk-types/src/interface/account/token_seeds.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/pda.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/processor.rs
  • sdk-libs/sdk-types/src/interface/program/decompression/token.rs

Comment on lines +219 to +230
let new_element_values = &new_element_values[..batch_len];
let mut staged_changelog = changelog.clone();
let mut staged_indexed_changelog = indexed_changelog.clone();
let mut staged_sparse_merkle_tree = sparse_merkle_tree.clone();
let initial_changelog_len = staged_changelog.len();
let mut new_root = [0u8; 32];
let mut low_element_circuit_merkle_proofs = Vec::with_capacity(batch_len);
let mut new_element_circuit_merkle_proofs = Vec::with_capacity(batch_len);
let mut patched_low_element_next_values = Vec::with_capacity(batch_len);
let mut patched_low_element_next_indices = Vec::with_capacity(batch_len);
let mut patched_low_element_values = Vec::with_capacity(batch_len);
let mut patched_low_element_indices = Vec::with_capacity(batch_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider truncating all slices consistently for clarity.

You truncate new_element_values at line 219 but access the low_element_* slices by index. While safe (you validated lengths), truncating all of them would make the code self-documenting:

let low_element_values = &low_element_values[..batch_len];
let low_element_next_values = &low_element_next_values[..batch_len];
// ... etc

This makes it immediately clear to readers that all slices are working with exactly batch_len elements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around
lines 219 - 230, The code truncates new_element_values to batch_len but not the
related low_element_* slices; to make intent explicit and avoid indexing
surprises, create truncated slice bindings for each low_element slice (e.g.,
low_element_values, low_element_next_values, low_element_indices,
low_element_next_indices, and any other per-element input slices) using the same
..batch_len slice pattern before the loop/processing that uses them, and then
use those truncated bindings everywhere instead of indexing the original
vectors; keep the existing
staged_changelog/staged_indexed_changelog/staged_sparse_merkle_tree logic
unchanged.

Comment on lines +85 to +86
bytes_read > 0
&& (has_http_ok_status(&response[..bytes_read]) || health_check_once_with_curl(timeout))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Health check logic may produce unexpected behavior.

The condition bytes_read > 0 && (has_http_ok_status(...) || health_check_once_with_curl(...)) means:

  1. If TCP read succeeds but returns non-200, we still call curl as a fallback
  2. If curl succeeds, we return true even though the TCP path got a different status

This could mask issues where the server is returning errors. Consider:

🔧 Suggested simplification
-    bytes_read > 0
-        && (has_http_ok_status(&response[..bytes_read]) || health_check_once_with_curl(timeout))
+    bytes_read > 0 && has_http_ok_status(&response[..bytes_read])

If TCP health check succeeds with a valid 200 response, use that result directly. Only fall back to curl when TCP operations fail (connect/write/read errors), which you already handle above.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bytes_read > 0
&& (has_http_ok_status(&response[..bytes_read]) || health_check_once_with_curl(timeout))
bytes_read > 0 && has_http_ok_status(&response[..bytes_read])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 85 - 86, The current health-check
boolean combines TCP read success and a curl fallback which can mask non-200
HTTP responses; change the logic in the health check block (the bytes_read check
in prover.rs) so that when bytes_read > 0 you return the TCP result directly by
calling has_http_ok_status(&response[..bytes_read]) and only call
health_check_once_with_curl(timeout) when the TCP path failed (e.g., bytes_read
== 0 or earlier connect/write/read errors); update the conditional controlling
those calls (the bytes_read branch) to implement this explicit fall-through
behavior.

Comment on lines +175 to +202
let spawn_result = async {
let mut child = Command::new(&prover_path)
.arg("start-prover")
.spawn()
.expect("Failed to start prover process");

let _ = command.wait_with_output();
.unwrap_or_else(|error| panic!("Failed to start prover process: {error}"));

let health_result = health_check(120, 1).await;
if health_result {
info!("Prover started successfully");
} else {
panic!("Failed to start prover, health check failed.");
match wait_for_prover_health(
STARTUP_HEALTH_CHECK_RETRIES,
Duration::from_secs(1),
&mut child,
)
.await
{
Ok(()) => {
monitor_prover_child(child);
info!("Prover started successfully");
}
Err(error) => {
let _ = child.kill();
let _ = child.wait();
panic!("Failed to start prover: {error}");
}
}
}
.await;

IS_LOADING.store(false, Ordering::Release);
spawn_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused spawn_result variable.

The async block at lines 175-199 returns () (unit type), so spawn_result at line 202 is a no-op expression that does nothing. This appears to be leftover from a refactor.

🧹 Suggested cleanup
-        let spawn_result = async {
+        async {
             let mut child = Command::new(&prover_path)
                 .arg("start-prover")
                 .spawn()
                 .unwrap_or_else(|error| panic!("Failed to start prover process: {error}"));

             match wait_for_prover_health(
                 STARTUP_HEALTH_CHECK_RETRIES,
                 Duration::from_secs(1),
                 &mut child,
             )
             .await
             {
                 Ok(()) => {
                     monitor_prover_child(child);
                     info!("Prover started successfully");
                 }
                 Err(error) => {
                     let _ = child.kill();
                     let _ = child.wait();
                     panic!("Failed to start prover: {error}");
                 }
             }
         }
-        .await;
+        .await;

         IS_LOADING.store(false, Ordering::Release);
-        spawn_result
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let spawn_result = async {
let mut child = Command::new(&prover_path)
.arg("start-prover")
.spawn()
.expect("Failed to start prover process");
let _ = command.wait_with_output();
.unwrap_or_else(|error| panic!("Failed to start prover process: {error}"));
let health_result = health_check(120, 1).await;
if health_result {
info!("Prover started successfully");
} else {
panic!("Failed to start prover, health check failed.");
match wait_for_prover_health(
STARTUP_HEALTH_CHECK_RETRIES,
Duration::from_secs(1),
&mut child,
)
.await
{
Ok(()) => {
monitor_prover_child(child);
info!("Prover started successfully");
}
Err(error) => {
let _ = child.kill();
let _ = child.wait();
panic!("Failed to start prover: {error}");
}
}
}
.await;
IS_LOADING.store(false, Ordering::Release);
spawn_result
async {
let mut child = Command::new(&prover_path)
.arg("start-prover")
.spawn()
.unwrap_or_else(|error| panic!("Failed to start prover process: {error}"));
match wait_for_prover_health(
STARTUP_HEALTH_CHECK_RETRIES,
Duration::from_secs(1),
&mut child,
)
.await
{
Ok(()) => {
monitor_prover_child(child);
info!("Prover started successfully");
}
Err(error) => {
let _ = child.kill();
let _ = child.wait();
panic!("Failed to start prover: {error}");
}
}
}
.await;
IS_LOADING.store(false, Ordering::Release);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/prover.rs` around lines 175 - 202, The local binding
spawn_result captures the result of the async block but the block returns () so
the variable is unused; remove the unnecessary spawn_result binding and simply
await the async block inline (or use let _ = ...await) where the
Command::new(&prover_path) / wait_for_prover_health / monitor_prover_child logic
runs, then keep IS_LOADING.store(false, Ordering::Release) afterwards—this
cleans up the dead variable while preserving the startup, health-check, and
child-monitor behavior.

Comment on lines +119 to +128
let _child = Command::new("sh")
.arg("-c")
.arg(path)
.arg(command)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()
.expect("Failed to start server process");
std::mem::drop(child);
// Intentionally detaching the spawned child; the caller only waits
// for the validator services to become available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Detached child process has no cleanup path.

When use_surfpool is false, the spawned process is intentionally detached (the handle is assigned to _child and dropped). This means:

  1. If the test crashes, the validator keeps running as an orphan.
  2. There's no way for the caller to terminate the validator gracefully.

The comment acknowledges this, but for test robustness you might consider storing the Child handle in a shared structure (e.g., a static or returned handle) so tests can clean up on teardown.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 119 - 128, The
spawned validator process (created with Command::new("sh")... and assigned to
the dropped variable _child when use_surfpool is false) is intentionally
detached and lacks any cleanup path; change the implementation to retain and
expose the Child handle so callers/tests can terminate the validator: either
store the Child in a shared structure (e.g., a static Mutex/OnceCell) or return
it from the function that starts the validator (so callers receive the Child and
can call kill/await), and update any callers/tests to call .kill()/.wait() or
drop appropriately during teardown; ensure the symbol names to change are the
creation site where _child is assigned and the surrounding function that gates
on use_surfpool.

Comment on lines +36 to 46
let output = std::process::Command::new("git")
.arg("rev-parse")
.arg("--show-toplevel")
.output()
.expect("Failed to get top-level directory");
let light_protocol_toplevel = std::str::from_utf8(&output.stdout)
.ok()?
.trim_end_matches("\r\n")
.trim_end_matches('\n')
.to_string();
let light_path = PathBuf::from(format!("{}/target/deploy/", light_protocol_toplevel));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle git rev-parse failure explicitly before building the path.

This branch does not check output.status.success(). On failure, it can still derive an invalid path (e.g., "/target/deploy/"), and .expect(...) can panic despite the function returning Option<PathBuf>.

Suggested fix
-        let output = std::process::Command::new("git")
+        let output = std::process::Command::new("git")
             .arg("rev-parse")
             .arg("--show-toplevel")
             .output()
-            .expect("Failed to get top-level directory");
+            .ok()?;
+        if !output.status.success() {
+            return None;
+        }
         let light_protocol_toplevel = std::str::from_utf8(&output.stdout)
             .ok()?
             .trim_end_matches("\r\n")
             .trim_end_matches('\n')
             .to_string();
-        let light_path = PathBuf::from(format!("{}/target/deploy/", light_protocol_toplevel));
+        let light_path = PathBuf::from(light_protocol_toplevel).join("target/deploy");
         Some(light_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let output = std::process::Command::new("git")
.arg("rev-parse")
.arg("--show-toplevel")
.output()
.expect("Failed to get top-level directory");
let light_protocol_toplevel = std::str::from_utf8(&output.stdout)
.ok()?
.trim_end_matches("\r\n")
.trim_end_matches('\n')
.to_string();
let light_path = PathBuf::from(format!("{}/target/deploy/", light_protocol_toplevel));
let output = std::process::Command::new("git")
.arg("rev-parse")
.arg("--show-toplevel")
.output()
.ok()?;
if !output.status.success() {
return None;
}
let light_protocol_toplevel = std::str::from_utf8(&output.stdout)
.ok()?
.trim_end_matches("\r\n")
.trim_end_matches('\n')
.to_string();
let light_path = PathBuf::from(light_protocol_toplevel).join("target/deploy");
Some(light_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/utils.rs` around lines 36 - 46, The code builds
light_path from the git rev-parse output without checking Command::output()
success, so on git failure you may produce an invalid path or panic; update the
block around the std::process::Command::new("git") call to check
output.status.success() (or handle Err from .output()) and return None (or an
appropriate error) if the command failed, only parsing stdout into
light_protocol_toplevel and constructing light_path when the status is success;
reference the existing Command invocation, the light_protocol_toplevel variable
and the light_path PathBuf construction to find and fix the logic.

Comment on lines +2593 to +2607
match response_result {
Ok(response_result) => {
let status = response_result.status();
let body = response_result.text().await.map_err(|error| {
IndexerError::CustomError(format!(
"failed to read prover response body: {error}"
))
})?;

if status.is_success() {
return Ok(ValidityProofWithContext {
accounts: account_proof_inputs,
addresses: address_proof_inputs,
proof: build_compressed_proof(&body)?.into(),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't bypass the retry loop on body/proof decode failures.

Line 2596 and Line 2606 still use ? inside the retry loop, so a transient truncated response or malformed proof payload exits immediately instead of consuming the remaining attempts. These failures should update last_error and continue, just like send/status failures.

🛠️ Suggested fix
                 match response_result {
                     Ok(response_result) => {
                         let status = response_result.status();
-                        let body = response_result.text().await.map_err(|error| {
-                            IndexerError::CustomError(format!(
-                                "failed to read prover response body: {error}"
-                            ))
-                        })?;
-
-                        if status.is_success() {
-                            return Ok(ValidityProofWithContext {
-                                accounts: account_proof_inputs,
-                                addresses: address_proof_inputs,
-                                proof: build_compressed_proof(&body)?.into(),
-                            });
-                        }
-
-                        let body_preview: String = body.chars().take(512).collect();
-                        last_error = format!(
-                            "prover returned HTTP {status} for validity proof request: {body_preview}"
-                        );
+                        match response_result.text().await {
+                            Ok(body) => {
+                                if status.is_success() {
+                                    match build_compressed_proof(&body) {
+                                        Ok(proof) => {
+                                            return Ok(ValidityProofWithContext {
+                                                accounts: account_proof_inputs,
+                                                addresses: address_proof_inputs,
+                                                proof: proof.into(),
+                                            });
+                                        }
+                                        Err(error) => {
+                                            let body_preview: String =
+                                                body.chars().take(512).collect();
+                                            last_error = format!(
+                                                "failed to parse prover proof response: {error}; body: {body_preview}"
+                                            );
+                                        }
+                                    }
+                                } else {
+                                    let body_preview: String = body.chars().take(512).collect();
+                                    last_error = format!(
+                                        "prover returned HTTP {status} for validity proof request: {body_preview}"
+                                    );
+                                }
+                            }
+                            Err(error) => {
+                                last_error =
+                                    format!("failed to read prover response body: {error}");
+                            }
+                        }
                     }
                     Err(error) => {
                         last_error =
                             format!("failed to contact prover for validity proof: {error}");
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2593 - 2607,
The retry loop currently uses the `?` operator when reading the response body
and when decoding via `build_compressed_proof`, which causes early exit on
transient failures; change the `response_result.text().await.map_err(...) ?` and
`build_compressed_proof(&body)?` sites to explicit error handling: capture the
errors into `last_error` (e.g., set last_error =
Some(IndexerError::CustomError(...)) or wrap the build error similarly) and
continue the loop instead of returning, and only return
`Ok(ValidityProofWithContext { accounts: account_proof_inputs, addresses:
address_proof_inputs, proof: ... })` when both reading and decoding succeed;
ensure you reference `response_result`, `build_compressed_proof`,
`ValidityProofWithContext`, `account_proof_inputs`, `address_proof_inputs`, and
`last_error` when implementing the changes.

pub in_tlv: Option<Vec<Vec<ExtensionInstructionData>>>,
#[cfg(feature = "token")]
pub token_seeds: Vec<Vec<u8>>,
pub token_seeds: Vec<Vec<Vec<u8>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider naming the new signer-seed shape.

Vec<Vec<Vec<u8>>> is hard to parse in public fields. A small alias or newtype for “one seed group per signer PDA” would make this contract much easier to follow and reduce accidental flattening in future call sites.

Also applies to: 299-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/sdk-types/src/interface/program/decompression/processor.rs` at line
129, The public field token_seeds uses the confusing type Vec<Vec<Vec<u8>>>;
define a small alias or newtype (e.g., type SeedsPerSigner = Vec<Vec<u8>>; or
pub struct SignerSeedGroups(pub Vec<SeedsPerSigner>);) to represent “one seed
group per signer PDA” and replace token_seeds: Vec<Vec<Vec<u8>>> with
token_seeds: SeedsPerSignerGroups (or the chosen name) and update the other
occurrence referenced (the second use at the later location) to use the new
alias/newtype so the public API is readable and less error-prone.

Comment on lines +151 to +153
// Push one signer seed group per vault PDA for the later Transfer2 CPI.
ctx.token_seeds
.push(seeds.iter().map(|seed| seed.to_vec()).collect());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== PackedLightAccountVariantTrait surface =="
rg -n -C3 'trait\s+PackedLightAccountVariantTrait|fn\s+into_in_token_data|fn\s+seed_refs_with_bump|fn\s+derive_owner' --type rust

echo
echo "== Places that populate MultiInputTokenDataWithContext.owner =="
rg -n -C4 'MultiInputTokenDataWithContext|owner\s*:' --type rust

echo
echo "== Token owner seed generation / macro expansion inputs =="
rg -n -C4 'owner_seeds|token::owner_seeds|derive_owner\(' --type rust

Repository: Lightprotocol/light-protocol

Length of output: 50386


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full context around lines 140-160 in decompression/token.rs ==="
sed -n '140,160p' sdk-libs/sdk-types/src/interface/program/decompression/token.rs

echo
echo "=== Function signature and broader scope (lines 100-170) ==="
sed -n '100,170p' sdk-libs/sdk-types/src/interface/program/decompression/token.rs

echo
echo "=== How packed.derive_owner() is used ==="
rg -n 'derive_owner\(' sdk-libs/sdk-types/src/interface/program/decompression/token.rs -A2 -B2

echo
echo "=== Into in_token_data usage and return value ==="
rg -n 'into_in_token_data' sdk-libs/sdk-types/src/interface/program/decompression/token.rs -A3 -B1

Repository: Lightprotocol/light-protocol

Length of output: 3779


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check macro generation for owner_seeds vs token_seeds ==="
rg -n 'owner_seeds|token::owner_seeds' sdk-libs/macros/src/ -B3 -A3 | head -100

echo
echo "=== Check how derive_owner is implemented in variant.rs macro ==="
sed -n '499,502p' sdk-libs/macros/src/light_pdas/program/variant_enum.rs

echo
echo "=== Look for owner_derivation construction ==="
rg -n 'owner_derivation' sdk-libs/macros/src/ -B5 -A5 | head -80

echo
echo "=== Check actual implementation of into_in_token_data in token_seeds.rs ==="
sed -n '265,296p' sdk-libs/sdk-types/src/interface/account/token_seeds.rs

Repository: Lightprotocol/light-protocol

Length of output: 12860


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check how owner_derivation is constructed for owner_seeds case ==="
sed -n '419,450p' sdk-libs/macros/src/light_pdas/program/variant_enum.rs

echo
echo "=== Check what token_data.owner is and how it's set ==="
rg -n 'struct.*TokenData|pub owner:' sdk-libs/sdk-types/src/interface/account/token_seeds.rs -A2 -B2

echo
echo "=== Check downstream Transfer2 usage in processor.rs ==="
rg -n 'Transfer2|invoke_signed' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs -B3 -A3

Repository: Lightprotocol/light-protocol

Length of output: 4626


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full owner_derivation code for both backends ==="
sed -n '419,453p' sdk-libs/macros/src/light_pdas/program/variant_enum.rs

echo
echo "=== Confirm: seed_refs_with_bump returns vault seeds, not owner seeds ==="
rg -n 'fn seed_refs_with_bump' sdk-libs/macros/src/light_pdas/program/variant_enum.rs -A10 | head -30

echo
echo "=== How in_token_data.owner is used at line 630 in processor.rs ==="
sed -n '625,655p' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs

Repository: Lightprotocol/light-protocol

Length of output: 4339


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check if owner seeds are stored anywhere separately ==="
rg -n 'owner_seeds|push.*owner' sdk-libs/sdk-types/src/interface/program/decompression/token.rs -B3 -A3

echo
echo "=== Full context: what gets pushed to ctx.token_seeds vs what gets used in Transfer2 ==="
sed -n '148,160p' sdk-libs/sdk-types/src/interface/program/decompression/token.rs

echo
echo "=== Confirm what data.owner actually is (the u8 index) ==="
sed -n '49,56p' sdk-libs/sdk-types/src/interface/account/token_seeds.rs

echo
echo "=== Check if TokenData owner field stores the derived owner pubkey or just an index ==="
rg -n 'struct Token |pub owner:' sdk-libs/sdk-types/src/ --type rust | grep -A1 'pub struct Token'

Repository: Lightprotocol/light-protocol

Length of output: 1307


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Find where token_data is created/populated in decompression/token.rs ==="
sed -n '1,50p' sdk-libs/sdk-types/src/interface/program/decompression/token.rs

echo
echo "=== Search for Token struct definition ==="
fd -e rs . sdk-libs/sdk-types/src | xargs rg -l 'pub struct Token[^D]' | head -5

echo
echo "=== Look at PackedTokenData and how owner field is set ==="
rg -n 'owner:.*packed\|owner.*index' sdk-libs/sdk-types/src/interface/program/decompression/ -B2 -A2

Repository: Lightprotocol/light-protocol

Length of output: 1903


🏁 Script executed:

#!/bin/bash
set -euo pusixo pipefail

echo "=== Find Token struct definition ==="
rg -n 'pub struct Token' sdk-libs/sdk-types/src/ --type rust -A5

echo
echo "=== How is token_data.owner set - look at into_in_token_data implementations ==="
rg -n 'into_in_token_data' sdk-libs/ --type rust -A15 | head -100

echo
echo "=== Look at create_accounts to understand owner field ==="
sed -n '1,100p' sdk-libs/sdk-types/src/interface/accounts/create_accounts.rs

Repository: Lightprotocol/light-protocol

Length of output: 123


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Find Token struct definition ==="
rg -n 'pub struct Token' sdk-libs/sdk-types/src/ --type rust -A5

echo
echo "=== How is token_data.owner set - look at into_in_token_data implementations ==="
rg -n 'into_in_token_data' sdk-libs/ --type rust -A15 | head -100

echo
echo "=== Look at create_accounts to understand owner field ==="
sed -n '1,100p' sdk-libs/sdk-types/src/interface/accounts/create_accounts.rs

Repository: Lightprotocol/light-protocol

Length of output: 10408


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check if there's a separate owner_seeds passed to Transfer2 ==="
rg -n 'signer_seed|owner_seed' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs -B2 -A2

echo
echo "=== Full Transfer2 invoke_signed call ==="
sed -n '640,670p' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs

echo
echo "=== What accounts are passed and how is owner marked ==="
sed -n '560,635p' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs | head -60

Repository: Lightprotocol/light-protocol

Length of output: 4180


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check how owner is set in the packed token data vs what derives owner ==="
rg -n 'owner.*index|owner.*as usize' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs -B3 -A3

echo
echo "=== Verify: does the owner account at data.owner actually need to sign? ==="
rg -n 'is_signer.*true' sdk-libs/sdk-types/src/interface/program/decompression/processor.rs -B5 -A2 | head -40

echo
echo "=== Critical: Check an example where owner_seeds might differ from token::seeds ==="
fd -e rs . sdk-tests/ | xargs rg -l 'token::owner_seeds' | head -3

Repository: Lightprotocol/light-protocol

Length of output: 1638


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Example: Check single-token-test for owner_seeds ==="
rg -n 'token::owner_seeds|#\[light_account' sdk-tests/single-token-test/src/lib.rs -B2 -A5

echo
echo "=== Check pinocchio test ==="
rg -n 'token::owner_seeds|#\[light_account' sdk-tests/pinocchio-light-program-test/src/lib.rs -B2 -A5 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 2629


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify: in the macro, does owner_seeds actually create a different PDA? ==="
sed -n '419,453p' sdk-libs/macros/src/light_pdas/program/variant_enum.rs

echo
echo "=== Final confirmation: seed_refs_with_bump vs derive_owner use different seeds ==="
echo "Token seeds example: VAULT_SEED + mint (from token::seeds)"
echo "Owner seeds example: VAULT_AUTH_SEED (from token::owner_seeds)"
echo "These derive DIFFERENT PDAs!"

echo
echo "=== Check: are the stored signer seeds in ctx.token_seeds ever updated to include owner_seeds? ==="
rg -n 'token_seeds|owner_seeds' sdk-libs/sdk-types/src/interface/program/decompression/ --type rust

Repository: Lightprotocol/light-protocol

Length of output: 3757


The seed group stored does not include owner seeds—mark the owner PDA as signer only if it matches the vault PDA.

When token::owner_seeds differs from token::seeds, derive_owner() derives a separate PDA. Line 152 stores only the vault PDA seeds in ctx.token_seeds, but line 631 in processor.rs marks account_metas[data.owner as usize + transfer2_packed_start].is_signer = true. If the owner account is a different PDA (which happens when token::owner_seeds ≠ token::seeds), the Transfer2 CPI will fail because there are no signer seeds for it. Either pass owner_seeds separately to invoke_signed, or ensure the marked owner account is actually the vault PDA itself.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/sdk-types/src/interface/program/decompression/token.rs` around lines
151 - 153, The code currently only pushes the vault PDA seed group into
ctx.token_seeds (ctx.token_seeds.push(seeds.iter().map(|seed|
seed.to_vec()).collect() in the decompression token flow), but when
token::owner_seeds ≠ token::seeds derive_owner() produces a different owner PDA
and processor.rs later marks the owner account (account_metas[data.owner as
usize + transfer2_packed_start].is_signer = true) without providing signer seeds
for it; fix by including the owner PDA seeds as a separate signer seed group
when owner_seeds differs from seeds (i.e., push token::owner_seeds mapped to
Vec<u8> into ctx.token_seeds in addition to the vault seeds, making sure the
order of seed groups matches how account_metas are constructed for the Transfer2
CPI), or alternatively change the signer-marking logic to only mark the owner as
signer when the owner PDA equals the vault PDA.

@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch 3 times, most recently from 30f6544 to a9b23b9 Compare March 17, 2026 10:27
@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from a9b23b9 to bdb7850 Compare March 17, 2026 11:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
forester/src/epoch_manager.rs (1)

570-586: ⚠️ Potential issue | 🔴 Critical

Stop producer tasks before draining new_tree_workers to avoid leaked workers on shutdown.

At Line 709 you start shutdown-draining workers, but Line 570’s guard (which aborts discover_trees_periodically / other producers) is only dropped at function exit. A producer can still register a new worker after the drain starts, leaving it unmanaged during shutdown.

🔧 Suggested ordering fix
-        let _guard = scopeguard::guard(
+        let background_tasks_guard = scopeguard::guard(
             (
                 current_previous_handle,
                 tree_discovery_handle,
                 balance_check_handle,
             ),
             |(h2, h3, h4)| {
@@
-        monitor_handle.abort();
-        self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
+        monitor_handle.abort();
+        drop(background_tasks_guard);
+        self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
             .await;
         result

Also applies to: 707-710

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 570 - 586, The scopeguard that
aborts producer tasks (the guard capturing current_previous_handle,
tree_discovery_handle, balance_check_handle which aborts
discover_trees_periodically and other producers) runs only at function exit, so
producers can spawn new workers while you later drain new_tree_workers; to fix,
ensure producers are stopped before draining: explicitly abort or drop the
producer handles (current_previous_handle.abort(),
tree_discovery_handle.abort(), balance_check_handle.abort() or drop the guard)
immediately prior to the code that drains new_tree_workers (the drain at/around
the previous Line 709 region), so no new workers can be registered during the
drain; keep the scopeguard or a new short-lived guard only for any remaining
cleanup that must run at exit.
♻️ Duplicate comments (3)
prover/client/tests/batch_address_append.rs (2)

127-127: 🧹 Nitpick | 🔵 Trivial

Variable name slightly misleading.

remaining_values takes the full slice &new_element_values[..], which includes all 10 elements—not a "remainder" after warmup. Consider renaming to batch_values or using new_element_values directly for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/tests/batch_address_append.rs` at line 127, The variable
remaining_values is misleading because it is assigned the full slice
&new_element_values[..] (all 10 elements) rather than a true "remainder"; rename
remaining_values to batch_values (or use new_element_values directly) wherever
it's declared and used to accurately reflect it contains the full batch, and
update references in the test (e.g., the binding named remaining_values and any
subsequent uses) to the new name to improve clarity.

108-123: 🧹 Nitpick | 🔵 Trivial

Warmup result intentionally discarded.

The return value from get_batch_address_append_circuit_inputs during warmup is .unwrap()ed but not stored. This is intentional—the warmup's purpose is to mutate sparse_merkle_tree, changelog, and indexed_changelog to simulate prior insertions. The circuit inputs themselves aren't needed.

Consider adding a brief comment to make this intent explicit, e.g.:

// Warmup: mutates sparse_merkle_tree, changelog, indexed_changelog;
// returned circuit inputs are intentionally discarded.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/tests/batch_address_append.rs` around lines 108 - 123, Add a
short inline comment where get_batch_address_append_circuit_inputs::<{
DEFAULT_BATCH_ADDRESS_TREE_HEIGHT as usize }>(...) is called and its result
.unwrap() is discarded to make the intent explicit: state that this call is a
warmup which mutates sparse_merkle_tree, changelog, and indexed_changelog and
that the returned circuit inputs are intentionally ignored. Reference the call
to get_batch_address_append_circuit_inputs and the mutated variables
sparse_merkle_tree, changelog, and indexed_changelog in the comment.
forester/src/epoch_manager.rs (1)

943-953: ⚠️ Potential issue | 🟠 Major

Per-tree dedicated runtime creation still bypasses bounded worker orchestration.

Line 945 + Line 950 still create one OS thread plus a dedicated Tokio runtime per discovered tree. Under bursty tree discovery, this can oversubscribe host resources and degrade slot-time reliability; route this path through the same bounded worker model used in active work processing.

Also applies to: 950-954

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 943 - 953, The current code
spawns a new OS thread and builds a dedicated Tokio runtime inside thread_handle
(using std::thread::spawn and tokio::runtime::Builder in the closure that
references tree_schedule.tree_accounts.merkle_tree, run_id/thread_run_id) which
bypasses the bounded worker pool; instead, remove the per-tree
std::thread::spawn + tokio::runtime::Builder usage and submit the per-tree work
onto the existing bounded worker/orchestrator used for active work processing
(use the shared worker pool API the project already uses to schedule tasks or
spawn blocking work), passing tree_schedule.tree_accounts.merkle_tree and run_id
into that task so we don't create a new runtime or OS thread per discovered
tree.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@prover/client/src/proof_types/batch_address_append/json.rs`:
- Around line 22-23: Add the missing LowElementNextIndices field to the
BatchAddressAppendParametersJSON schema so the Go server accepts and forwards
the Rust client's lowElementNextIndices JSON key; update the struct to include a
LowElementNextIndices []string `json:"lowElementNextIndices"` field (and any
corresponding export/serialization logic), then ensure the marshal/unmarshal
helpers and the code that assembles circuit inputs (e.g., wherever
BatchAddressAppendParametersJSON is converted to circuit parameters) propagate
LowElementNextIndices into the circuit input mapping so the Rust
low_element_next_indices values are preserved end-to-end.

In `@prover/client/src/proof.rs`:
- Around line 72-83: The function deserialize_hex_string_to_be_bytes currently
parses the entire hex string into a BigUint before enforcing the 32-byte limit;
preemptively validate trimmed_str length instead to avoid costly parsing of huge
inputs: after computing trimmed_str (from strip_prefix calls) check the hex
digit length and return ProverClientError::InvalidHexString if trimmed_str.len()
> 64 (or if it represents more than 64 hex chars), optionally also
handling/normalizing an odd-length hex string, and only then call
num_bigint::BigUint::from_str_radix and proceed with to_bytes_be().

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 570-586: The scopeguard that aborts producer tasks (the guard
capturing current_previous_handle, tree_discovery_handle, balance_check_handle
which aborts discover_trees_periodically and other producers) runs only at
function exit, so producers can spawn new workers while you later drain
new_tree_workers; to fix, ensure producers are stopped before draining:
explicitly abort or drop the producer handles (current_previous_handle.abort(),
tree_discovery_handle.abort(), balance_check_handle.abort() or drop the guard)
immediately prior to the code that drains new_tree_workers (the drain at/around
the previous Line 709 region), so no new workers can be registered during the
drain; keep the scopeguard or a new short-lived guard only for any remaining
cleanup that must run at exit.

---

Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 943-953: The current code spawns a new OS thread and builds a
dedicated Tokio runtime inside thread_handle (using std::thread::spawn and
tokio::runtime::Builder in the closure that references
tree_schedule.tree_accounts.merkle_tree, run_id/thread_run_id) which bypasses
the bounded worker pool; instead, remove the per-tree std::thread::spawn +
tokio::runtime::Builder usage and submit the per-tree work onto the existing
bounded worker/orchestrator used for active work processing (use the shared
worker pool API the project already uses to schedule tasks or spawn blocking
work), passing tree_schedule.tree_accounts.merkle_tree and run_id into that task
so we don't create a new runtime or OS thread per discovered tree.

In `@prover/client/tests/batch_address_append.rs`:
- Line 127: The variable remaining_values is misleading because it is assigned
the full slice &new_element_values[..] (all 10 elements) rather than a true
"remainder"; rename remaining_values to batch_values (or use new_element_values
directly) wherever it's declared and used to accurately reflect it contains the
full batch, and update references in the test (e.g., the binding named
remaining_values and any subsequent uses) to the new name to improve clarity.
- Around line 108-123: Add a short inline comment where
get_batch_address_append_circuit_inputs::<{ DEFAULT_BATCH_ADDRESS_TREE_HEIGHT as
usize }>(...) is called and its result .unwrap() is discarded to make the intent
explicit: state that this call is a warmup which mutates sparse_merkle_tree,
changelog, and indexed_changelog and that the returned circuit inputs are
intentionally ignored. Reference the call to
get_batch_address_append_circuit_inputs and the mutated variables
sparse_merkle_tree, changelog, and indexed_changelog in the comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9d48f8bf-b69c-4a55-bc20-837c6ac8df18

📥 Commits

Reviewing files that changed from the base of the PR and between 64ada13 and bdb7850.

⛔ Files ignored due to path filters (5)
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
📒 Files selected for processing (29)
  • forester/src/compressible/ctoken/compressor.rs
  • forester/src/compressible/ctoken/state.rs
  • forester/src/compressible/ctoken/types.rs
  • forester/src/compressible/mint/compressor.rs
  • forester/src/compressible/mint/state.rs
  • forester/src/compressible/mint/types.rs
  • forester/src/compressible/pda/compressor.rs
  • forester/src/epoch_manager.rs
  • forester/src/forester_status.rs
  • forester/src/metrics.rs
  • forester/src/priority_fee.rs
  • forester/src/processor/v2/proof_cache.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/tests/e2e_test.rs
  • forester/tests/legacy/batched_state_async_indexer_test.rs
  • forester/tests/test_batch_append_spent.rs
  • forester/tests/test_compressible_ctoken.rs
  • forester/tests/test_compressible_mint.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/tests/batch_address_append.rs
  • prover/client/tests/init_merkle_tree.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs

Comment on lines +22 to +23
#[serde(rename = "lowElementNextIndices")]
pub low_element_next_indices: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if LowElementNextIndices is defined anywhere in the Go prover code
rg -n "LowElementNextIndices" --type go prover/server/

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check how low_element_next_indices is used in Rust code
rg -n "low_element_next_indices" --type rust prover/client/

Repository: Lightprotocol/light-protocol

Length of output: 2729


🏁 Script executed:

#!/bin/bash
# Check for similar patterns (next_indices, next_values) in both Rust and Go
rg -n "next.*indice|next.*value" --type rust -A 2 -B 2 prover/client/src/proof_types/batch_address_append/ | head -50

Repository: Lightprotocol/light-protocol

Length of output: 5007


🏁 Script executed:

#!/bin/bash
# Check the complete struct definition in Rust to understand all fields
cat -n prover/client/src/proof_types/batch_address_append/json.rs | head -40

Repository: Lightprotocol/light-protocol

Length of output: 1863


🏁 Script executed:

#!/bin/bash
# Check Go struct for similar "Low" fields to understand the pattern
rg -n "struct BatchAddressAppend" --type go -A 20 prover/server/ | head -60

Repository: Lightprotocol/light-protocol

Length of output: 54


Go server schema is missing lowElementNextIndices field—critical data will be lost during deserialization.

The Rust client sends lowElementNextIndices (mapped from low_element_next_indices) as part of the batch address append circuit inputs. This field is active throughout the Rust codebase: it's computed during proof input construction (see proof_inputs.rs:273-276), represents the next index of the low element in the sorted linked list, and is essential for validating that new addresses fit correctly between existing entries.

However, the Go server's BatchAddressAppendParametersJSON struct (at prover/server/prover/v2/marshal_batch_address_append.go) does not define a LowElementNextIndices field. Go's JSON unmarshaler will silently drop this field during deserialization, meaning critical validation data never reaches the circuit.

This is a schema mismatch: the Rust client and Go server disagree on the required fields. The Go server side must be updated to accept and forward this field to the circuit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_address_append/json.rs` around lines 22 -
23, Add the missing LowElementNextIndices field to the
BatchAddressAppendParametersJSON schema so the Go server accepts and forwards
the Rust client's lowElementNextIndices JSON key; update the struct to include a
LowElementNextIndices []string `json:"lowElementNextIndices"` field (and any
corresponding export/serialization logic), then ensure the marshal/unmarshal
helpers and the code that assembles circuit inputs (e.g., wherever
BatchAddressAppendParametersJSON is converted to circuit parameters) propagate
LowElementNextIndices into the circuit input mapping so the Rust
low_element_next_indices values are preserved end-to-end.

Comment on lines +72 to +83
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str
.strip_prefix("0x")
.or_else(|| hex_str.strip_prefix("0X"))
.unwrap_or(hex_str);
let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
.map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
let big_uint_bytes = big_uint.to_bytes_be();
if big_uint_bytes.len() > 32 {
return Err(ProverClientError::InvalidHexString(format!(
"{hex_str}: exceeds 32 bytes"
)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject oversized hex strings before BigUint parsing.

At Line 77, parsing happens before enforcing the 32-byte bound. A very large hex string can trigger avoidable CPU/memory pressure before you reject it.

🔧 Suggested hardening diff
 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
     let trimmed_str = hex_str
         .strip_prefix("0x")
         .or_else(|| hex_str.strip_prefix("0X"))
         .unwrap_or(hex_str);
+    let normalized = trimmed_str.trim_start_matches('0');
+    if normalized.len() > 64 {
+        return Err(ProverClientError::InvalidHexString(format!(
+            "{hex_str}: exceeds 32 bytes"
+        )));
+    }
     let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
         .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str
.strip_prefix("0x")
.or_else(|| hex_str.strip_prefix("0X"))
.unwrap_or(hex_str);
let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
.map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
let big_uint_bytes = big_uint.to_bytes_be();
if big_uint_bytes.len() > 32 {
return Err(ProverClientError::InvalidHexString(format!(
"{hex_str}: exceeds 32 bytes"
)));
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str
.strip_prefix("0x")
.or_else(|| hex_str.strip_prefix("0X"))
.unwrap_or(hex_str);
let normalized = trimmed_str.trim_start_matches('0');
if normalized.len() > 64 {
return Err(ProverClientError::InvalidHexString(format!(
"{hex_str}: exceeds 32 bytes"
)));
}
let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
.map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
let big_uint_bytes = big_uint.to_bytes_be();
if big_uint_bytes.len() > 32 {
return Err(ProverClientError::InvalidHexString(format!(
"{hex_str}: exceeds 32 bytes"
)));
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 83, The function
deserialize_hex_string_to_be_bytes currently parses the entire hex string into a
BigUint before enforcing the 32-byte limit; preemptively validate trimmed_str
length instead to avoid costly parsing of huge inputs: after computing
trimmed_str (from strip_prefix calls) check the hex digit length and return
ProverClientError::InvalidHexString if trimmed_str.len() > 64 (or if it
represents more than 64 hex chars), optionally also handling/normalizing an
odd-length hex string, and only then call num_bigint::BigUint::from_str_radix
and proceed with to_bytes_be().

@sergeytimoshin sergeytimoshin force-pushed the sergey/compressible-clone-cleanup branch from bdb7850 to 72c11ff Compare March 17, 2026 12:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
forester/src/compressible/ctoken/state.rs (1)

86-92: 🧹 Nitpick | 🔵 Trivial

Carry the shared-ownership win to update_from_token.

Nice direction, but this still forces an owned Token at the boundary and only wraps it in Arc at insertion time. If a caller already has an Arc<Token> or wants to share the decoded token, it still pays a full clone before calling this method. Taking impl Into<Arc<Token>> here would make the clone-pressure reduction end-to-end.

Suggested refactor
-    pub fn update_from_token(
-        &self,
-        pubkey: Pubkey,
-        ctoken: Token,
-        lamports: u64,
-        account_size: usize,
-    ) -> Result<()> {
+    pub fn update_from_token(
+        &self,
+        pubkey: Pubkey,
+        ctoken: impl Into<Arc<Token>>,
+        lamports: u64,
+        account_size: usize,
+    ) -> Result<()> {
+        let ctoken = ctoken.into();
+
         // Skip mint accounts - only track actual token accounts
         if !ctoken.is_token_account() {
             debug!("Skipping non-token account {}", pubkey);
             return Ok(());
         }

-        let compressible_slot = match calculate_compressible_slot(&ctoken, lamports, account_size) {
+        let compressible_slot =
+            match calculate_compressible_slot(ctoken.as_ref(), lamports, account_size) {
             Ok(slot) => slot,
             Err(e) => {
                 warn!(
                     "Failed to calculate compressible slot for {}: {}",
                     pubkey, e
@@
         let state = CTokenAccountState {
             pubkey,
-            account: Arc::new(ctoken),
+            account: ctoken,
             lamports,
             compressible_slot,
             is_ata,
         };

Also applies to: 127-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/compressible/ctoken/state.rs` around lines 86 - 92, The method
update_from_token currently accepts an owned Token which forces callers to clone
even if they already have shared ownership; change its signature to accept
token: impl Into<std::sync::Arc<Token>> (or a concrete Arc<Token>) and convert
it to an Arc inside the function so callers can pass either Arc<Token> or Token
without extra clones; apply the same change to the other analogous method that
currently takes Token (the one referenced around the second occurrence) so both
insertion/update paths use Arc<Token> and avoid unnecessary ownership cloning.
forester/src/compressible/ctoken/compressor.rs (1)

49-59: 🧹 Nitpick | 🔵 Trivial

Push the shared-signer boundary into new().

Right now the sharing starts only after construction. Any caller creating multiple compressors from the same payer still has to clone the Keypair up front to satisfy this owned parameter. Accepting Arc<Keypair> — or better, impl Into<Arc<Keypair>> — would make the refactor actually remove those upstream clones too.

Suggested refactor
     pub fn new(
         rpc_pool: Arc<SolanaRpcPool<R>>,
         tracker: Arc<CTokenAccountTracker>,
-        payer_keypair: Keypair,
+        payer_keypair: impl Into<Arc<Keypair>>,
         transaction_policy: TransactionPolicy,
     ) -> Self {
         Self {
             rpc_pool,
             tracker,
-            payer_keypair: Arc::new(payer_keypair),
+            payer_keypair: payer_keypair.into(),
             transaction_policy,
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/compressible/ctoken/compressor.rs` around lines 49 - 59, Change
the CTokenCompressor::new constructor to accept a shared payer Keypair by taking
either Arc<Keypair> or an impl Into<Arc<Keypair>> (instead of an owned Keypair),
and store that Arc directly in the struct's payer_keypair field; update the new
signature and the payer_keypair initialization (referenced as pub fn new and the
payer_keypair field) so callers can pass a cloned Arc rather than cloning the
Keypair itself.
♻️ Duplicate comments (5)
sdk-libs/client/src/local_test_validator.rs (2)

59-60: ⚠️ Potential issue | 🟡 Minor

Return an explicit failure when project root resolution fails.

Right now, None from get_project_root() causes a silent no-op, which can make test failures hard to diagnose.

Suggested fix
-pub async fn spawn_validator(config: LightValidatorConfig) {
-    if let Some(project_root) = get_project_root() {
+pub async fn spawn_validator(config: LightValidatorConfig) -> Result<(), String> {
+    let project_root = get_project_root()
+        .ok_or_else(|| "Failed to determine project root; cannot spawn validator".to_string())?;
+    {
         let project_root = project_root.trim_end_matches(['\n', '\r']);
         let executable = format!("{}/cli/test_bin/run", project_root);
         let mut args = vec!["test-validator".to_string()];
         // ...
-    }
+    }
+    Ok(())
 }

Also applies to: 134-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 59 - 60, When
get_project_root() returns None the code silently does nothing; change
spawn_validator to surface a failure instead of no-op by updating its signature
to return a Result (e.g., pub async fn spawn_validator(config:
LightValidatorConfig) -> anyhow::Result<()>) and handle the None case with an
explicit error (use anyhow::bail! or return Err(anyhow!(...)) with a clear
message like "could not resolve project root"). Apply the same change to the
other similar spawn_validator occurrence so both functions consistently
propagate or return a clear failure when get_project_root() is None.

118-133: ⚠️ Potential issue | 🟠 Major

Tighten non-surfpool process lifecycle checks.

Two reliability gaps remain:

  1. an early successful exit currently passes, even though validator availability is required;
  2. the child handle is dropped, so callers cannot deterministically tear down the process.
Suggested fix direction
-            if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
-                assert!(
-                    status.success(),
-                    "Validator exited early with error: {}",
-                    status
-                );
-            }
+            if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
+                panic!("Validator exited before wait_time elapsed: {}", status);
+            }
+            // Return/store `child` so tests can terminate it during teardown.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 118 - 133, The
current spawn logic allows an early successful exit to be treated as OK and then
drops the child handle, preventing deterministic teardown; change the post-spawn
check so that if child.try_wait() returns Some(_) (any exit) after the initial
sleep it is treated as an error (fail the test or return Err) rather than
allowing a success, and stop dropping the Child: return or store the spawned
Child (the value from Command::new(...).spawn()) to the caller so they can call
wait()/kill() deterministically on teardown (or implement Drop for the wrapper
to kill the child). Concretely, update the block around
Command::new(...).spawn(), replace the assert! that accepts status.success()
with an unconditional error when try_wait() yields Some(status), and change the
function signature/ownership so the spawned Child is returned or retained for
controlled shutdown.
sdk-libs/program-test/src/indexer/test_indexer.rs (1)

2594-2620: ⚠️ Potential issue | 🟠 Major

Early returns in retry loop prevent retries on transient failures.

The ? operators on lines 2597-2601 (reading response body) and line 2607 (build_compressed_proof) cause immediate exit on failure rather than consuming remaining retry attempts. This is inconsistent with how send failures are handled (lines 2616-2618), which properly set last_error and continue.

A truncated HTTP response or temporarily malformed proof payload would skip remaining retries, while a network timeout would correctly retry.

,

🛠️ Suggested fix to handle all failures uniformly
                 match response_result {
                     Ok(response_result) => {
                         let status = response_result.status();
-                        let body = response_result.text().await.map_err(|error| {
-                            IndexerError::CustomError(format!(
-                                "failed to read prover response body: {error}"
-                            ))
-                        })?;
-
-                        if status.is_success() {
-                            return Ok(ValidityProofWithContext {
-                                accounts: account_proof_inputs,
-                                addresses: address_proof_inputs,
-                                proof: build_compressed_proof(&body)?.into(),
-                            });
-                        }
-
-                        let body_preview: String = body.chars().take(512).collect();
-                        last_error = format!(
-                            "prover returned HTTP {status} for validity proof request: {body_preview}"
-                        );
+                        match response_result.text().await {
+                            Ok(body) => {
+                                if status.is_success() {
+                                    match build_compressed_proof(&body) {
+                                        Ok(proof) => {
+                                            return Ok(ValidityProofWithContext {
+                                                accounts: account_proof_inputs,
+                                                addresses: address_proof_inputs,
+                                                proof: proof.into(),
+                                            });
+                                        }
+                                        Err(e) => {
+                                            let body_preview: String = body.chars().take(512).collect();
+                                            last_error = format!(
+                                                "failed to parse prover proof: {e}; body: {body_preview}"
+                                            );
+                                        }
+                                    }
+                                } else {
+                                    let body_preview: String = body.chars().take(512).collect();
+                                    last_error = format!(
+                                        "prover returned HTTP {status} for validity proof request: {body_preview}"
+                                    );
+                                }
+                            }
+                            Err(error) => {
+                                last_error = format!("failed to read prover response body: {error}");
+                            }
+                        }
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2594 - 2620,
The match arm returns early on failures because the code uses the `?` operator
when reading the response body and when calling `build_compressed_proof`, which
prevents retries; change both to handle errors like the send failures do:
replace `response_result.text().await.map_err(...) ?` with an explicit Result
match (or `match response_result.text().await { Ok(body) => ..., Err(e) => {
last_error = format!("failed to read prover response body: {e}"); continue; }
}`) and wrap `build_compressed_proof(&body)` similarly (e.g. `match
build_compressed_proof(&body) { Ok(p) => p, Err(e) => { last_error =
format!("failed to build compressed proof: {e}"); continue; } }`) so that
`ValidityProofWithContext` (using `account_proof_inputs` and
`address_proof_inputs`) is only returned on success and all transient failures
update `last_error` and allow the retry loop to continue.
forester/src/epoch_manager.rs (1)

943-954: ⚠️ Potential issue | 🟠 Major

Route new-tree processing through the shared worker limiter instead of per-tree runtimes.

Line 945 still spawns one OS thread and Line 950 builds a new 2-thread Tokio runtime per discovered tree. This bypasses max_parallel_tree_workers() and can still oversubscribe CPU/memory during tree discovery bursts.

Also applies to: 945-1015

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 943 - 954, The code currently
spawns a new OS thread (std::thread::spawn) and builds a fresh Tokio runtime
per-tree (Builder::new_multi_thread()...build()) which bypasses the shared
worker limiter; change this to run tree processing on the existing/shared
runtime and gate it with the global worker limiter returned by
max_parallel_tree_workers(): instead of creating thread_handle and a per-tree
runtime, acquire a permit from the shared limiter (e.g. await limiter.acquire/
acquire_owned), then tokio::spawn an async task that receives cancel_rx and uses
completion_tx to signal completion, reusing self_clone/run_id and
tree_schedule/tree_accounts.merkle_tree values; keep the oneshot
cancel_tx/cancel_rx and completion_tx/completion_rx semantics but remove the
per-tree runtime and std::thread::spawn so parallelism is controlled by the
shared limiter.
prover/client/src/proof.rs (1)

72-83: ⚠️ Potential issue | 🟠 Major

Reject oversized hex before BigUint parsing.

At Line 77, parsing happens before enforcing the 32-byte bound. Very large inputs can cause avoidable CPU/memory pressure before rejection.

🔧 Proposed hardening diff
 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
     let trimmed_str = hex_str
         .strip_prefix("0x")
         .or_else(|| hex_str.strip_prefix("0X"))
         .unwrap_or(hex_str);
+    let normalized = trimmed_str.trim_start_matches('0');
+    if normalized.len() > 64 {
+        return Err(ProverClientError::InvalidHexString(format!(
+            "{hex_str}: exceeds 32 bytes"
+        )));
+    }
     let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
         .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 83, In
deserialize_hex_string_to_be_bytes, reject inputs that would exceed 32 bytes
before calling BigUint::from_str_radix: after computing trimmed_str (the hex
digits after removing "0x"/"0X"), strip leading '0' characters to find the
significant hex length and if that significant length is greater than 64 hex
chars (or if trimmed_str.len() > 64 and not just leading zeros) return
ProverClientError::InvalidHexString immediately; also handle odd-length hex by
allowing a single leading '0' pad rule when checking length so you still reject
anything representing >32 bytes without invoking big-int parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 2542-2548: The call to check_forester_eligibility is always
passing &tree_accounts.merkle_tree, which is incorrect for TreeType::StateV2
because V2 trees use the queue-based scheduling/eligibility key; compute the
proper eligibility key before the call (e.g., match on tree_accounts.tree_type /
TreeType::StateV2 and select the queue-based scheduling/eligibility key for V2,
otherwise use tree_accounts.merkle_tree) and pass that selected key (instead of
always tree_accounts.merkle_tree) into check_forester_eligibility alongside
epoch_pda, current_light_slot, and epoch_info so V2 trees use the queue-derived
eligible index.

In `@forester/tests/test_batch_append_spent.rs`:
- Around line 352-356: The timeout-based wait on the service JoinHandle can leak
a spawned task (and abort() won't stop spawn_blocking tasks); replace the
timeout(Duration::from_secs(5), service_handle).await pattern with an explicit
tokio::select! that awaits either the service_handle completion or a sleep
timeout, and in the timeout branch explicitly send the graceful shutdown signal
(use the existing shutdown_sender.send(())) and then wait a short additional
grace period for the handle to complete; if it still doesn't finish, fail the
test deterministically or refactor to remove the spawn_blocking wrapper around
run_pipeline::<LightClient>() so the handle can be properly cancelled. Ensure
references to service_handle, shutdown_sender.send(()),
run_pipeline::<LightClient>(), and spawn_blocking are used to locate and update
the code.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 131-142: The current batch validation only checks outer vector
lengths and does not validate each inner Merkle proof length, which leads to a
panic at the later try_into().unwrap() call; update the validation loop (the
section iterating over ("merkle_proofs", merkle_proofs.len())) to iterate each
element of merkle_proofs and verify its depth/length matches batch_size
expectations (or the expected depth for conversion) and if any inner proof has
an invalid length return Err(ProverClientError::GenericError(...)) instead of
allowing try_into().unwrap() to panic; ensure you reference and validate the
exact items used by the conversion that calls try_into().unwrap() so the
conversion can safely use try_into() without unwrap.

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 28-29: The JSON struct field path_index (pub path_index: u64) and
the builder conversion that produces a u64 (the to_u64() call which currently
returns ProverClientError::IntegerConversion on failure) must be constrained to
the server-expected uint32 range: after converting the BigInt to u64, validate
that the value <= u32::MAX and return a clear ProverClientError if it exceeds
that bound, or else cast/truncate safely to u32 before assigning to path_index
(so the serialized JSON uses a 32-bit-compatible value); update the conversion
logic where input.index_hashed_indexed_element_leaf is converted to u64 to
perform this range check and error handling.

In `@prover/client/tests/init_merkle_tree.rs`:
- Around line 223-225: The test currently calls
.index_hashed_indexed_element_leaf.to_u64().unwrap(), which can panic; change
this to a fallible conversion that returns a Result and propagate or map the
error instead of unwrapping (e.g., replace the unwrap with a call that uses ? or
map_err to produce a descriptive test failure). Locate the use of
index_hashed_indexed_element_leaf and replace the to_u64().unwrap() with a
to_u64() handling path that converts the Option/Result into the test's
Result-returning flow (or maps the conversion error into a test error) so the
test fails deterministically without panicking.

---

Outside diff comments:
In `@forester/src/compressible/ctoken/compressor.rs`:
- Around line 49-59: Change the CTokenCompressor::new constructor to accept a
shared payer Keypair by taking either Arc<Keypair> or an impl Into<Arc<Keypair>>
(instead of an owned Keypair), and store that Arc directly in the struct's
payer_keypair field; update the new signature and the payer_keypair
initialization (referenced as pub fn new and the payer_keypair field) so callers
can pass a cloned Arc rather than cloning the Keypair itself.

In `@forester/src/compressible/ctoken/state.rs`:
- Around line 86-92: The method update_from_token currently accepts an owned
Token which forces callers to clone even if they already have shared ownership;
change its signature to accept token: impl Into<std::sync::Arc<Token>> (or a
concrete Arc<Token>) and convert it to an Arc inside the function so callers can
pass either Arc<Token> or Token without extra clones; apply the same change to
the other analogous method that currently takes Token (the one referenced around
the second occurrence) so both insertion/update paths use Arc<Token> and avoid
unnecessary ownership cloning.

---

Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 943-954: The code currently spawns a new OS thread
(std::thread::spawn) and builds a fresh Tokio runtime per-tree
(Builder::new_multi_thread()...build()) which bypasses the shared worker
limiter; change this to run tree processing on the existing/shared runtime and
gate it with the global worker limiter returned by max_parallel_tree_workers():
instead of creating thread_handle and a per-tree runtime, acquire a permit from
the shared limiter (e.g. await limiter.acquire/ acquire_owned), then
tokio::spawn an async task that receives cancel_rx and uses completion_tx to
signal completion, reusing self_clone/run_id and
tree_schedule/tree_accounts.merkle_tree values; keep the oneshot
cancel_tx/cancel_rx and completion_tx/completion_rx semantics but remove the
per-tree runtime and std::thread::spawn so parallelism is controlled by the
shared limiter.

In `@prover/client/src/proof.rs`:
- Around line 72-83: In deserialize_hex_string_to_be_bytes, reject inputs that
would exceed 32 bytes before calling BigUint::from_str_radix: after computing
trimmed_str (the hex digits after removing "0x"/"0X"), strip leading '0'
characters to find the significant hex length and if that significant length is
greater than 64 hex chars (or if trimmed_str.len() > 64 and not just leading
zeros) return ProverClientError::InvalidHexString immediately; also handle
odd-length hex by allowing a single leading '0' pad rule when checking length so
you still reject anything representing >32 bytes without invoking big-int
parsing.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 59-60: When get_project_root() returns None the code silently does
nothing; change spawn_validator to surface a failure instead of no-op by
updating its signature to return a Result (e.g., pub async fn
spawn_validator(config: LightValidatorConfig) -> anyhow::Result<()>) and handle
the None case with an explicit error (use anyhow::bail! or return
Err(anyhow!(...)) with a clear message like "could not resolve project root").
Apply the same change to the other similar spawn_validator occurrence so both
functions consistently propagate or return a clear failure when
get_project_root() is None.
- Around line 118-133: The current spawn logic allows an early successful exit
to be treated as OK and then drops the child handle, preventing deterministic
teardown; change the post-spawn check so that if child.try_wait() returns
Some(_) (any exit) after the initial sleep it is treated as an error (fail the
test or return Err) rather than allowing a success, and stop dropping the Child:
return or store the spawned Child (the value from Command::new(...).spawn()) to
the caller so they can call wait()/kill() deterministically on teardown (or
implement Drop for the wrapper to kill the child). Concretely, update the block
around Command::new(...).spawn(), replace the assert! that accepts
status.success() with an unconditional error when try_wait() yields
Some(status), and change the function signature/ownership so the spawned Child
is returned or retained for controlled shutdown.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2594-2620: The match arm returns early on failures because the
code uses the `?` operator when reading the response body and when calling
`build_compressed_proof`, which prevents retries; change both to handle errors
like the send failures do: replace `response_result.text().await.map_err(...) ?`
with an explicit Result match (or `match response_result.text().await { Ok(body)
=> ..., Err(e) => { last_error = format!("failed to read prover response body:
{e}"); continue; } }`) and wrap `build_compressed_proof(&body)` similarly (e.g.
`match build_compressed_proof(&body) { Ok(p) => p, Err(e) => { last_error =
format!("failed to build compressed proof: {e}"); continue; } }`) so that
`ValidityProofWithContext` (using `account_proof_inputs` and
`address_proof_inputs`) is only returned on success and all transient failures
update `last_error` and allow the retry loop to continue.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 629e56fd-8178-4129-a65e-dfe5832c21a3

📥 Commits

Reviewing files that changed from the base of the PR and between bdb7850 and 72c11ff.

⛔ Files ignored due to path filters (12)
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
📒 Files selected for processing (29)
  • forester/src/compressible/ctoken/compressor.rs
  • forester/src/compressible/ctoken/state.rs
  • forester/src/compressible/ctoken/types.rs
  • forester/src/compressible/mint/compressor.rs
  • forester/src/compressible/mint/state.rs
  • forester/src/compressible/mint/types.rs
  • forester/src/compressible/pda/compressor.rs
  • forester/src/epoch_manager.rs
  • forester/src/forester_status.rs
  • forester/src/metrics.rs
  • forester/src/priority_fee.rs
  • forester/src/processor/v2/proof_cache.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/tests/e2e_test.rs
  • forester/tests/legacy/batched_state_async_indexer_test.rs
  • forester/tests/test_batch_append_spent.rs
  • forester/tests/test_compressible_ctoken.rs
  • forester/tests/test_compressible_mint.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/tests/batch_address_append.rs
  • prover/client/tests/init_merkle_tree.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs

Comment on lines +2542 to +2548
if !self.check_forester_eligibility(
&epoch_pda,
current_light_slot,
&tree_accounts.merkle_tree,
epoch_info.epoch,
&epoch_info,
)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the correct eligibility key for V2 tree types.

check_forester_eligibility now always receives tree_accounts.merkle_tree here. For TreeType::StateV2, this can produce a different eligible index than the queue-based scheduling key and lead to false not eligible decisions.

🔧 Proposed fix
-            if !self.check_forester_eligibility(
+            let eligibility_pubkey = match tree_accounts.tree_type {
+                TreeType::StateV2 => tree_accounts.queue,
+                TreeType::AddressV2 => tree_accounts.merkle_tree,
+                _ => tree_accounts.queue,
+            };
+            if !self.check_forester_eligibility(
                 &epoch_pda,
                 current_light_slot,
-                &tree_accounts.merkle_tree,
+                &eligibility_pubkey,
                 epoch_info.epoch,
                 &epoch_info,
             )? {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !self.check_forester_eligibility(
&epoch_pda,
current_light_slot,
&tree_accounts.merkle_tree,
epoch_info.epoch,
&epoch_info,
)? {
let eligibility_pubkey = match tree_accounts.tree_type {
TreeType::StateV2 => tree_accounts.queue,
TreeType::AddressV2 => tree_accounts.merkle_tree,
_ => tree_accounts.queue,
};
if !self.check_forester_eligibility(
&epoch_pda,
current_light_slot,
&eligibility_pubkey,
epoch_info.epoch,
&epoch_info,
)? {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 2542 - 2548, The call to
check_forester_eligibility is always passing &tree_accounts.merkle_tree, which
is incorrect for TreeType::StateV2 because V2 trees use the queue-based
scheduling/eligibility key; compute the proper eligibility key before the call
(e.g., match on tree_accounts.tree_type / TreeType::StateV2 and select the
queue-based scheduling/eligibility key for V2, otherwise use
tree_accounts.merkle_tree) and pass that selected key (instead of always
tree_accounts.merkle_tree) into check_forester_eligibility alongside epoch_pda,
current_light_slot, and epoch_info so V2 trees use the queue-derived eligible
index.

Comment on lines +352 to +356
let join_result = timeout(Duration::from_secs(5), service_handle)
.await
.expect("forester service did not shut down within timeout");
let service_result = join_result.expect("forester service task panicked");
service_result.expect("run_pipeline::<LightClient>() failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== run_forester shutdown join path ==="
fd test_batch_append_spent.rs --exec sed -n '325,357p' {}

echo
echo "=== long-running spawned loop inside pipeline ==="
fd lib.rs forester/src --exec sed -n '446,461p' {}

Repository: Lightprotocol/light-protocol

Length of output: 2151


🏁 Script executed:

#!/bin/bash
# Verify the exact lines mentioned in the review comment
fd test_batch_append_spent.rs --exec sed -n '352,356p' {}

echo
echo "=== Checking if there are other timeout patterns in tests ==="
rg "timeout\(" forester/tests --type rs -A 2 | head -50

Repository: Lightprotocol/light-protocol

Length of output: 479


🌐 Web query:

tokio::time::timeout behavior when future is dropped on Err(Elapsed)

💡 Result:

tokio::time::timeout(dur, fut).await takes ownership of fut. If it returns Err(tokio::time::error::Elapsed), the wrapped future is considered canceled, which in Tokio’s model means it is dropped (and therefore will never be polled again). Any synchronous cleanup in Drop runs at that point. [1]

Two closely-related nuances from the docs:

  • Tokio checks whether the deadline has elapsed before polling the inner future on each poll; if the inner future does a long CPU-bound stretch without yielding (.await), it can still finish “late” and you can get Ok(_) even though wall-clock time exceeded dur. [1]
  • If you need the original future back, you can only get it while the Timeout is still pending by working with the Timeout value directly and calling Timeout::into_inner(self) (this consumes the Timeout). Once you .await and get Err(Elapsed), the inner future has already been dropped. [1][2]

Sources:
[1] Tokio timeout docs/source (cancellation + “checked before polling” note) https://docs.rs/tokio/latest/src/tokio/time/timeout.rs.html
[2] tokio::time::Timeout API (shows into_inner) https://docs.rs/tokio/latest/tokio/time/struct.Timeout.html


🏁 Script executed:

#!/bin/bash
# Find timeout patterns in the test file without file-type restriction
grep -n "timeout" forester/tests/test_batch_append_spent.rs

echo
echo "=== Check for tokio::select! patterns ==="
grep -r "tokio::select" forester/tests

echo
echo "=== Check for abort() patterns in tests ==="
grep -r "\.abort()" forester/tests

Repository: Lightprotocol/light-protocol

Length of output: 524


🌐 Web query:

tokio JoinHandle mutable reference timeout idiomatic Rust pattern abort on elapsed

💡 Result:

Idiomatic Tokio pattern: race the JoinHandle against a timer, and on timeout call abort().

1) tokio::select! (most direct)

use std::time::Duration;
use tokio::{task::JoinHandle, time};

async fn join_with_timeout<T>(
    mut h: JoinHandle<T>,
    dur: Duration,
) -> Result<T, time::error::Elapsed> {
    tokio::select! {
        res = &mut h => Ok(res.expect("task panicked")),
        _ = time::sleep(dur) => {
            h.abort(); // cooperative cancellation
            Err(time::error::Elapsed::new())
        }
    }
}

&mut h is the normal way to poll the same in-flight future in select! (and you may need to tokio::pin! futures that aren’t Unpin). [2]

2) tokio::time::timeout + async block borrowing the handle

This keeps the handle available so you can abort after timeout fires (because timeout(dur, h) would move h):

use std::time::Duration;
use tokio::{task::JoinHandle, time};

async fn join_with_timeout<T>(
    mut h: JoinHandle<T>,
    dur: Duration,
) -> Result<T, time::error::Elapsed> {
    match time::timeout(dur, async { (&mut h).await }).await {
        Ok(res) => Ok(res.expect("task panicked")),
        Err(e) => { h.abort(); Err(e) }
    }
}

Notes that often matter

  • Dropping a JoinHandle does not cancel the task; it can keep running detached. [1]
  • Aborting is cooperative; awaiting an aborted task typically yields a cancelled JoinError. [1]
  • Tasks spawned with spawn_blocking cannot be aborted once running. [3]

Sources: Tokio JoinHandle docs (abort + detached behavior) [1], Tokio select! tutorial (polling via &mut + pinning) [2], Tokio AbortHandle docs (abort semantics + spawn_blocking caveat) [3].


🏁 Script executed:

#!/bin/bash
# Check how run_forester is called
grep -n "run_forester" forester/tests/test_batch_append_spent.rs | head -20

echo
echo "=== Check test function structure ==="
sed -n '1,50p' forester/tests/test_batch_append_spent.rs

Repository: Lightprotocol/light-protocol

Length of output: 1840


🌐 Web query:

tokio spawn_blocking abort behavior why cannot abort blocking tasks

💡 Result:

In Tokio, JoinHandle::abort() is cooperative cancellation: it cancels an async task the next time the task is polled and reaches an .await (a yield point). If the task is running CPU code between .awaits (or stuck in a busy loop), Tokio can’t safely “interrupt” it; it can only stop polling it and drop the future when possible. [2]

spawn_blocking is different: it runs your closure on Tokio’s blocking thread pool as normal synchronous code (not an async future). Once that closure is running on an OS thread, Tokio has no safe, general way to forcibly stop it (Rust/std don’t provide safe “kill this thread now” semantics, and doing so could leak locks/resources or violate invariants). Therefore, Tokio documents that spawn_blocking tasks cannot be aborted—calling abort() “will not have any effect, and the task will continue running normally.” [1]

What to do instead (common patterns):

  • Make the blocking function cooperatively cancellable: pass in an AtomicBool/CancellationToken-like flag and check it periodically inside the closure, returning early.
  • Use timeouts around the waiting side (tokio::time::timeout(handle.await)), understanding this only stops waiting; it does not stop the blocking work itself.
  • For truly forceful termination, run the work in a separate process you can kill (heavier, but actually preemptive).

Sources: Tokio spawn_blocking docs [1] and Tokio maintainers’ explanation of abort() behavior [2].

[1] https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html
[2] tokio-rs/tokio#5534


🏁 Script executed:

#!/bin/bash
# Find where shutdown_receiver is used in run_pipeline
rg "shutdown_receiver" forester/src/lib.rs -A 3 -B 1

echo
echo "=== Check if there's cooperative shutdown handling ==="
rg "is_shutdown" forester/src -A 2 | head -40

Repository: Lightprotocol/light-protocol

Length of output: 127


🌐 Web query:

tokio spawn_blocking cooperative cancellation pattern shutdown signal

💡 Result:

Key constraint: spawn_blocking is not force-cancellable

Tokio’s cancellation via JoinHandle::abort() does not stop work spawned with tokio::task::spawn_blocking; the blocking closure will keep running until it returns on its own. [1]

Tokio’s own spawn_blocking docs explicitly recommend a cooperative cancellation token pattern for this. [2]


Cooperative cancellation pattern (shutdown signal → token → blocking loop checks)

Use:

  • an async shutdown signal (e.g. tokio::signal::ctrl_c()),
  • a shared CancellationToken,
  • and (optionally) a TaskTracker to wait for tasks to finish cleanup.

Tokio’s graceful shutdown guide shows this general approach with cancellation tokens. [3]

Example

use tokio_util::sync::CancellationToken;
use tokio_util::task::TaskTracker;

fn main() {
    let rt = tokio::runtime::Runtime::new().unwrap();
    rt.block_on(async {
        let token = CancellationToken::new();
        let tracker = TaskTracker::new();

        // Spawn blocking worker (cannot be aborted; must cooperate).
        let worker_token = token.clone();
        tracker.spawn_blocking(move || {
            while !worker_token.is_cancelled() {
                // do a *bounded* chunk of blocking work
                do_one_unit_of_work();

                // important: loop back and check cancellation frequently
            }

            // optional: cleanup here
            cleanup();
        });

        // Shutdown trigger (Ctrl+C shown here)
        tokio::select! {
            _ = tokio::signal::ctrl_c() => {
                token.cancel(); // request cooperative stop
            }
        }

        // Stop accepting new work (optional) and wait for tasks to exit
        tracker.close();
        tracker.wait().await;
    });
}

fn do_one_unit_of_work() {
    // keep it finite: avoid "block forever" calls if you want timely shutdown
}
fn cleanup() {}

Why TaskTracker: it’s designed to pair a CancellationToken (signal) with waiting for tracked tasks to finish. [4]


Practical rules for spawn_blocking shutdown

  1. Make blocking work interruptible: structure it as “small units” so the loop can re-check is_cancelled() frequently. [2][4]
  2. Avoid indefinite blocking syscalls (or wrap them in something with timeouts), otherwise the thread may not observe cancellation promptly.
  3. Use JoinHandle::abort() for async tasks (it cancels at .await yield points), but don’t rely on it for spawn_blocking. [1][5]

Sources

[1] https://docs.rs/tokio/latest/tokio/task/struct.AbortHandle.html
[2] https://strawlab.org/strand-braid-api-docs/latest/src/tokio/task/blocking.rs.html
[3] https://tokio.rs/tokio/topics/shutdown
[4] https://doc.cuprate.org/tokio_util/task/task_tracker/struct.TaskTracker.html
[5] tokio-rs/tokio#5534


🏁 Script executed:

#!/bin/bash
# Search for where shutdown_receiver is actually used in run_pipeline
rg "shutdown_receiver" forester/src -B 5 -A 10

echo
echo "=== Check the signature and start of run_pipeline ==="
fd lib.rs forester/src --exec sed -n '1,100p' {} | grep -A 30 "async fn run_pipeline"

Repository: Lightprotocol/light-protocol

Length of output: 1845


🏁 Script executed:

#!/bin/bash
# Find run_pipeline function definition
rg "pub async fn run_pipeline" forester/src -A 50 | head -80

echo
echo "=== Search for where shutdown_receiver is actually consumed ==="
rg "shutdown_receiver" forester --type rust -A 5 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 8525


🏁 Script executed:

#!/bin/bash
# Find the actual implementation of run_pipeline_with_run_id and how it uses shutdown_service
sed -n '200,600p' forester/src/lib.rs | grep -A 200 "pub async fn run_pipeline_with_run_id"

echo
echo "=== Search for shutdown_service usage in lib.rs ==="
rg "shutdown_service" forester/src/lib.rs -A 3 -B 1

Repository: Lightprotocol/light-protocol

Length of output: 5348


🏁 Script executed:

#!/bin/bash
# Find run_service function to see how shutdown_service is consumed
rg "async fn run_service" forester/src/lib.rs -A 80 | head -120

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Search for run_service definition more broadly
rg "fn run_service" forester/src -A 50 | head -100

echo
echo "=== Also check if shutdown_service is awaited ==="
rg "shutdown_service" forester/src/lib.rs | cat

Repository: Lightprotocol/light-protocol

Length of output: 3761


🏁 Script executed:

#!/bin/bash
# Let me trace how shutdown is actually used in run_service
sed -n '1,200p' forester/src/epoch_manager.rs | grep -A 150 "pub async fn run_service"

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Find the line number of run_service first
grep -n "pub async fn run_service" forester/src/epoch_manager.rs

echo
echo "=== Get context around run_service to see shutdown usage ==="
rg "shutdown" forester/src/epoch_manager.rs -B 2 -A 5 | head -100

Repository: Lightprotocol/light-protocol

Length of output: 3604


🏁 Script executed:

#!/bin/bash
# Search for where request_shutdown is called
rg "request_shutdown" forester/src/epoch_manager.rs -B 3 -A 3

echo
echo "=== Search for where shutdown receiver is actually awaited ==="
sed -n '4527,4700p' forester/src/epoch_manager.rs | grep -A 30 "shutdown"

Repository: Lightprotocol/light-protocol

Length of output: 5512


🏁 Script executed:

#!/bin/bash
# Now I understand - shutdown is properly used in select! with &mut
# But the issue is about the spawn_blocking task not being aborted when timeout occurs
# Let me verify what the actual behavior is in the test

echo "=== Examining the test's spawn_blocking task closure ==="
sed -n '333,351p' forester/tests/test_batch_append_spent.rs

echo
echo "=== Check if shutdown propagates properly into the blocking task ==="
sed -n '333,345p' forester/tests/test_batch_append_spent.rs

Repository: Lightprotocol/light-protocol

Length of output: 1177


🏁 Script executed:

#!/bin/bash
# Final verification: confirm that timeout dropping the handle doesn't terminate the task
# by checking what happens when the test is run

echo "=== Check if there are any test failures or issues related to timeout ==="
rg "forester service did not shut down within timeout" forester --type rust

echo
echo "=== Search for any task detachment or leakage issues in test results ==="
git log --oneline --all -20 -- "forester/tests/test_batch_append_spent.rs" 2>/dev/null || echo "Git history not available"

echo
echo "=== Check the actual behavior - does test panic or just timeout? ==="
sed -n '352,358p' forester/tests/test_batch_append_spent.rs

Repository: Lightprotocol/light-protocol

Length of output: 1448


The timeout path risks task leakage, but the suggested fix won't work for spawn_blocking.

The concern is valid: when timeout() elapses (line 352), the JoinHandle is dropped without terminating the underlying task. The task continues running detached into subsequent tests. However, the suggested fix—calling .abort() on the handle—will not terminate a spawn_blocking task. Tokio's spawn_blocking runs synchronous code on an OS thread pool and cannot be forcibly aborted; abort() has no effect.

The codebase already sends graceful shutdown signals (shutdown_sender.send(()) at line 351) which propagate through select! statements in the pipeline. This is good. But if graceful shutdown fails (e.g., the pipeline hangs, ignores signals, or hits a bug), the timeout elapses and the task leaks—still a problem.

Recommendation: Use tokio::select! instead of timeout() to make the cancellation path explicit and ensure cleanup on timeout. Alternatively, refactor to avoid the spawn_blocking wrapper entirely, since run_pipeline() is already async-aware.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/test_batch_append_spent.rs` around lines 352 - 356, The
timeout-based wait on the service JoinHandle can leak a spawned task (and
abort() won't stop spawn_blocking tasks); replace the
timeout(Duration::from_secs(5), service_handle).await pattern with an explicit
tokio::select! that awaits either the service_handle completion or a sleep
timeout, and in the timeout branch explicitly send the graceful shutdown signal
(use the existing shutdown_sender.send(())) and then wait a short additional
grace period for the handle to complete; if it still doesn't finish, fail the
test deterministically or refactor to remove the spawn_blocking wrapper around
run_pipeline::<LightClient>() so the handle can be properly cancelled. Ensure
references to service_handle, shutdown_sender.send(()),
run_pipeline::<LightClient>(), and spawn_blocking are used to locate and update
the code.

Comment on lines +131 to +142
let batch_len = batch_size as usize;
for (name, len) in [
("old_leaves", old_leaves.len()),
("leaves", leaves.len()),
("merkle_proofs", merkle_proofs.len()),
] {
if len != batch_len {
return Err(ProverClientError::GenericError(format!(
"invalid batch append inputs: {name} len {len} != batch_size {batch_len}"
)));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate each Merkle proof depth before conversion.

The new guard checks only batch vector lengths. A malformed inner proof length still panics at try_into().unwrap() (Line 193). Return an error instead of allowing panic.

🔧 Proposed fix
     for (name, len) in [
         ("old_leaves", old_leaves.len()),
         ("leaves", leaves.len()),
         ("merkle_proofs", merkle_proofs.len()),
     ] {
         if len != batch_len {
             return Err(ProverClientError::GenericError(format!(
                 "invalid batch append inputs: {name} len {len} != batch_size {batch_len}"
             )));
         }
     }
+    if let Some((i, proof)) = merkle_proofs
+        .iter()
+        .enumerate()
+        .find(|(_, proof)| proof.len() != HEIGHT)
+    {
+        return Err(ProverClientError::GenericError(format!(
+            "invalid batch append inputs: merkle_proofs[{i}] len {} != expected height {HEIGHT}",
+            proof.len()
+        )));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 131
- 142, The current batch validation only checks outer vector lengths and does
not validate each inner Merkle proof length, which leads to a panic at the later
try_into().unwrap() call; update the validation loop (the section iterating over
("merkle_proofs", merkle_proofs.len())) to iterate each element of merkle_proofs
and verify its depth/length matches batch_size expectations (or the expected
depth for conversion) and if any inner proof has an invalid length return
Err(ProverClientError::GenericError(...)) instead of allowing
try_into().unwrap() to panic; ensure you reference and validate the exact items
used by the conversion that calls try_into().unwrap() so the conversion can
safely use try_into() without unwrap.

Comment on lines +28 to 29
pub path_index: u64,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking pathIndex field types across client/server..."
rg -n --type=go 'PathIndex\s+u?int(32|64)|json:"pathIndex"' -C2
rg -n --type=rust 'path_index:\s*u(32|64)|pathIndex' -C2

echo
echo "Expected: client and server use the same width for pathIndex."
echo "If server remains uint32, clamp/validate client values to u32::MAX or update server schema."

Repository: Lightprotocol/light-protocol

Length of output: 7048


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Checking non_inclusion v2 JSON builder conversion logic ==="
# Find the full file to see lines 54-62 and builder implementation
wc -l prover/client/src/proof_types/non_inclusion/v2/json.rs
echo
echo "=== Full file content ==="
cat -n prover/client/src/proof_types/non_inclusion/v2/json.rs

echo
echo "=== Searching for From/TryFrom implementations for NonInclusionInputsJSON ==="
ast-grep --pattern 'impl.*From.*NonInclusionInputsJSON {
  $$$
}'

echo
echo "=== Searching for IntegerConversion or path_index conversion ==="
rg -n 'IntegerConversion|path_index|PathIndex' prover/client/src/proof_types/non_inclusion/v2/ -A2 -B2

Repository: Lightprotocol/light-protocol

Length of output: 3665


Schema mismatch: client serializes pathIndex as u64 while server expects uint32.

The builder converts to u64 (lines 54–62) and the JSON struct field is declared as u64 (line 28), but all server-side schemas (prover/server/prover/v2/marshal_non_inclusion.go and all v1 variants) define PathIndex uint32. Values above u32::MAX will be serialized to JSON and fail during server unmarshaling.

The conversion from BigInt to u64 is safely handled with to_u64().ok_or_else(...) returning IntegerConversion on failure, but there's no validation that the resulting u64 fits within uint32 bounds before it reaches the server.

Conversion logic (lines 54–62)
path_index: input
    .index_hashed_indexed_element_leaf
    .to_u64()
    .ok_or_else(|| {
        ProverClientError::IntegerConversion(format!(
            "failed to convert path index {} to u64",
            input.index_hashed_indexed_element_leaf
        ))
    })?,

Either clamp the field to u32 before serialization, validate that the value is ≤ u32::MAX, or update the server schema to accept uint64.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs` around lines 28 - 29,
The JSON struct field path_index (pub path_index: u64) and the builder
conversion that produces a u64 (the to_u64() call which currently returns
ProverClientError::IntegerConversion on failure) must be constrained to the
server-expected uint32 range: after converting the BigInt to u64, validate that
the value <= u32::MAX and return a clear ProverClientError if it exceeds that
bound, or else cast/truncate safely to u32 before assigning to path_index (so
the serialized JSON uses a 32-bit-compatible value); update the conversion logic
where input.index_hashed_indexed_element_leaf is converted to u64 to perform
this range check and error handling.

Comment on lines 223 to 225
.index_hashed_indexed_element_leaf
.to_u32()
.to_u64()
.unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid unwrap() in this fallible path.

Line 225 can still panic on conversion failure even though surrounding code uses Result. Prefer explicit error mapping for deterministic test failures.

🔧 Proposed fix
-        path_index: merkle_inputs
-            .index_hashed_indexed_element_leaf
-            .to_u64()
-            .unwrap(),
+        path_index: merkle_inputs
+            .index_hashed_indexed_element_leaf
+            .to_u64()
+            .ok_or_else(|| {
+                ProverClientError::IntegerConversion(
+                    "failed to convert index_hashed_indexed_element_leaf to u64".to_string(),
+                )
+            })?,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/tests/init_merkle_tree.rs` around lines 223 - 225, The test
currently calls .index_hashed_indexed_element_leaf.to_u64().unwrap(), which can
panic; change this to a fallible conversion that returns a Result and propagate
or map the error instead of unwrapping (e.g., replace the unwrap with a call
that uses ? or map_err to produce a descriptive test failure). Locate the use of
index_hashed_indexed_element_leaf and replace the to_u64().unwrap() with a
to_u64() handling path that converts the Option/Result into the test's
Result-returning flow (or maps the conversion error into a test error) so the
test fails deterministically without panicking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant